I'm considering adding runtime checks for nil pointer access. The benefits are not as big as one might suppose, since most systems catch a nil pointer access anyway and give a segmentation fault automatically. Whereas explicit runtime checks might add considerable runtime overhead. Additionally, some valid (well, let's at least say, working) BP and similar programs might fail then, e.g. if they pass dereferenced pointers (which may be nil) as var parameters, and the called routine either doesn't access the parameter or explicitly checks if it's address is nil.
On the positive side, such a check might provide a somewhat nicer error message (proper runtime error instead of segfault), and it can be caught like other runtime errors (e.g. via `AtExit') rather than with a signal handler as for segfaults. Also, strict standard conformance makes it an error, even in a situation as described above where the var parameter is never used.
Therefore, perhaps, the check should be off by default, and on only in standard modes, and of course, triggered by its own new option (`--[no-]pointer-checking'?).
Another runtime check (which BTW should also be rather easy to implement now), is for objects, whether they point to a valid VMT, using the `NegatedSize' VMT field, just like BP does. AFAIR, BP does it on virtual method calls, where the VMT is used. In principle, the check could be done on every object operation/access, but that might be overkill.
Since a simple check per virtual method call is not much overhead, I propose the check to be on by default, in all dialects. Of course, it will also get its own option (`--[no-]object-checking'?).
BP couples it to range-checking (I can only guess why -- perhaps because they only have short, i.e. one-letter, compiler directives, and they wanted to save a letter). For strict compatibility, we could add a combined option (`--[no-]range-and-object-checking'), and make `{$R[+-]}' equivalent to that.
Frank
Frank Heckenbach wrote:
I'm considering adding runtime checks for nil pointer access. The benefits are not as big as one might suppose, since most systems catch a nil pointer access anyway and give a segmentation fault automatically. Whereas explicit runtime checks might add considerable runtime overhead. Additionally, some valid (well, let's at least say, working) BP and similar programs might fail then, e.g. if they pass dereferenced pointers (which may be nil) as var parameters, and the called routine either doesn't access the parameter or explicitly checks if it's address is nil.
On the positive side, such a check might provide a somewhat nicer error message (proper runtime error instead of segfault), and it can be caught like other runtime errors (e.g. via `AtExit') rather than with a signal handler as for segfaults. Also, strict standard conformance makes it an error, even in a situation as described above where the var parameter is never used.
Since Pascal does not bandy pointers about, pointer checks can be quite accurate. However they have to be customized to the actual package. For example, using my nmalloc package for DJGPP agreement between a couple of internal linkages, and absence of the 'freed' marker, suffice. On an ancient CP/M system the check consisted of ensuring that the pointer lay within the current heap region, or (for thoroughness) following a linked list for membership. None of this applies to another installation. Once you have such a NIL check is trivially added, the only need being to control energization. If the pointer check is done during pointer variable assignment, it can be omitted during dereference, and the only dereference need is for NIL, and the NIL check is not needed during the assignment check.
It should be possible to turn this on and off by pseudo comments. Without this any creation of C style pointers will become impossible.
CBFalconer wrote:
Frank Heckenbach wrote:
I'm considering adding runtime checks for nil pointer access. The benefits are not as big as one might suppose, since most systems catch a nil pointer access anyway and give a segmentation fault automatically. Whereas explicit runtime checks might add considerable runtime overhead. Additionally, some valid (well, let's at least say, working) BP and similar programs might fail then, e.g. if they pass dereferenced pointers (which may be nil) as var parameters, and the called routine either doesn't access the parameter or explicitly checks if it's address is nil.
On the positive side, such a check might provide a somewhat nicer error message (proper runtime error instead of segfault), and it can be caught like other runtime errors (e.g. via `AtExit') rather than with a signal handler as for segfaults. Also, strict standard conformance makes it an error, even in a situation as described above where the var parameter is never used.
Since Pascal does not bandy pointers about, pointer checks can be quite accurate. However they have to be customized to the actual package. For example, using my nmalloc package for DJGPP agreement between a couple of internal linkages, and absence of the 'freed' marker, suffice. On an ancient CP/M system the check consisted of ensuring that the pointer lay within the current heap region, or (for thoroughness) following a linked list for membership. None of this applies to another installation. Once you have such a NIL check is trivially added, the only need being to control energization. If the pointer check is done during pointer variable assignment, it can be omitted during dereference, and the only dereference need is for NIL, and the NIL check is not needed during the assignment check.
It should be possible to turn this on and off by pseudo comments. Without this any creation of C style pointers will become impossible.
These might also be interesting points, but not actually what I plan ATM. Yes, we could relatively easily check if a pointer lies within the allocated area. But for some purposes that's too strict. E.g., one might want to operate on memory-mapped files or devices and use pointers in their area as well. And IMHO procedure pointers (which do not exist in standard Pascal) also have their use (callbacks, procedure tables, etc.), and would lie outside the heap area.
AFAICS, doing something as you describe while taking care of all these necessities might be quite tricky. Sure, there is some benefit.
IMHO dangling pointers are the more serious problem, since wild pointers (outside the heap) area can be avoided, e.g. by initializing all pointers to nil (though I don't recommend to do this always) and not doing other dirty stuff, while dangling pointers are more difficult to avoid "statically".
Your linked list suggestion would detect them (simple heap bounds checking would not). But it adds an O(n) factor in each pointer access which is often not acceptable. I'm not aware of other methods which don't either also add some runtime overhead (though one might get it down to O(ln n) by using a good data structure, perhaps a balanced tree) or produce a memory leak (such as by keeping track of all disposed pointers).
Well, while I'm writing this, it occurs to me, if we'd ever want to do such complex checking, it would probably be done in a subroutine anyway -- despite the call overhead, but inlining it for each pointer access could really explode program sizes. Given that, it might as well be a user routine which can be called via a procedure pointer. This way, you and others interested could experiment with various implementations and keep them independent of the compiler core, and environment/application-specific implementations are possible without creating a specially patched compiler.
Such a user-routine check should be easy to do in the compiler, and if there's interest, I can probably do this together with my plans. Of course, there should be a way for a simple inlined nil check without much runtime overhead, so we'd probably need a three-way option (or two options working together -- one to turn it on/off, and one to select the kind of checking; the latter would then usually be set globally, the former could be turned off locally for dirty code or for efficiency reasons).
Frank
Frank Heckenbach wrote:
... snip ...
IMHO dangling pointers are the more serious problem, since wild pointers (outside the heap) area can be avoided, e.g. by initializing all pointers to nil (though I don't recommend to do this always) and not doing other dirty stuff, while dangling pointers are more difficult to avoid "statically".
Your linked list suggestion would detect them (simple heap bounds checking would not). But it adds an O(n) factor in each pointer access which is often not acceptable. I'm not aware of other methods which don't either also add some runtime overhead (though one might get it down to O(ln n) by using a good data structure, perhaps a balanced tree) or produce a memory leak (such as by keeping track of all disposed pointers).
That is only one method. I don't know if gpc depends on the underlying c malloc/free/realloc package, but if so that package should be replaceable (in many cases) with my nmalloc for djgpp. That depends on alignment size, 32 bits, byte ptr to void * conversions for arithmetic, and getting memory via sbrk. Given that the pointer validity tests become O(1). They are not guaranteed to catch everything, but the odds are high.
You can look it over at:
http://cbfalconer.home.att.net/download/nmalloc.zip
My other point is that most of the pointer checks are done when passing or storing the pointer in the first place, not while dereferencing it. Pointers that the compiler/runtime makes for VAR params etc. are known valid, provided any indexes off them are properly checked. Heap pointers come from something checked at creation, so the only worries should be dereferencing NIL and stale pointers. NIL is a fixed bit pattern. The nmalloc techniques can detect most of the stale items. They only get exercized during free or realloc.
CBFalconer wrote:
Frank Heckenbach wrote:
... snip ...
IMHO dangling pointers are the more serious problem, since wild pointers (outside the heap) area can be avoided, e.g. by initializing all pointers to nil (though I don't recommend to do this always) and not doing other dirty stuff, while dangling pointers are more difficult to avoid "statically".
Your linked list suggestion would detect them (simple heap bounds checking would not). But it adds an O(n) factor in each pointer access which is often not acceptable. I'm not aware of other methods which don't either also add some runtime overhead (though one might get it down to O(ln n) by using a good data structure, perhaps a balanced tree) or produce a memory leak (such as by keeping track of all disposed pointers).
That is only one method. I don't know if gpc depends on the underlying c malloc/free/realloc package, but if so that package should be replaceable (in many cases) with my nmalloc for djgpp.
Yes, it can be replaced this way. (It can also be replaced on the Pascal level, see GetMemPtr in module GPC, but since this wouldn't affect, e.g., C libraries used in the program, including libc, which do memory allocation themselves, replacing malloc etc. on the linker level actually seems better here.)
That depends on alignment size, 32 bits, byte ptr to void * conversions for arithmetic, and getting memory via sbrk. Given that the pointer validity tests become O(1). They are not guaranteed to catch everything, but the odds are high.
I think it's dangerous to talk about "odds" here. It all depends on the use, doesn't it? A program which never frees any memory, obviously has smaller odds of such failure than a program which frequently allocates and frees memory (in particular, if it always of the same size, as it is in lists).
If you're willing to accept such "odds" in certain programs, you can do this, of course. That's why I suggested a user-routine for pointer validity checking.
My other point is that most of the pointer checks are done when passing or storing the pointer in the first place, not while dereferencing it. Pointers that the compiler/runtime makes for VAR params etc. are known valid, provided any indexes off them are properly checked. Heap pointers come from something checked at creation, so the only worries should be dereferencing NIL and stale pointers. NIL is a fixed bit pattern. The nmalloc techniques can detect most of the stale items.
Which means that we do need this check at dereferencing time (as they may become stale between assignment and dereferencing).
Frank
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
TYPE PtrRec = RECORD realLocation : INTEGER; { the real memory address } refCount : INTEGER { the number of pointers pointing to this record } END;
POINTER = ^PtrRec;
Now when assigning a pointer:
if ptr is not nil then dec(ptr^.refCount); ptr := newAddress; if ptr is not nil then inc(ptr^.refCount);
When derefing:
if ptr^.realLocation = 0 then complain else readValue := ptr^.realLocation
etc, etc... when refCount reaches zero, you can reuse the record. You could turn the record into a linked list structure, and add to the end only when marching through the list during an allocation does not reveal a reusable record.
This adds O(1) to the deref, assignment, and dispose times for a pointer, but makes a slightly larger hit when allocating memory. Since the records can be reused after the refCount hits zero, the memory leak factor is reduced in intensity.
You would need to be careful to correctly reduce reference counts from local variables, from pointers inside deallocated record structures and arrays of pointers, arrays of records containing pointers, and so forth.
The coding for this would be somewhat complex, but it may help to achieve what you are calling for...
On Mar 18, 2005, at 7:48 AM, Frank Heckenbach wrote:
CBFalconer wrote:
Frank Heckenbach wrote:
... snip ...
IMHO dangling pointers are the more serious problem, since wild pointers (outside the heap) area can be avoided, e.g. by initializing all pointers to nil (though I don't recommend to do this always) and not doing other dirty stuff, while dangling pointers are more difficult to avoid "statically".
Your linked list suggestion would detect them (simple heap bounds checking would not). But it adds an O(n) factor in each pointer access which is often not acceptable. I'm not aware of other methods which don't either also add some runtime overhead (though one might get it down to O(ln n) by using a good data structure, perhaps a balanced tree) or produce a memory leak (such as by keeping track of all disposed pointers).
That is only one method. I don't know if gpc depends on the underlying c malloc/free/realloc package, but if so that package should be replaceable (in many cases) with my nmalloc for djgpp.
Yes, it can be replaced this way. (It can also be replaced on the Pascal level, see GetMemPtr in module GPC, but since this wouldn't affect, e.g., C libraries used in the program, including libc, which do memory allocation themselves, replacing malloc etc. on the linker level actually seems better here.)
That depends on alignment size, 32 bits, byte ptr to void * conversions for arithmetic, and getting memory via sbrk. Given that the pointer validity tests become O(1). They are not guaranteed to catch everything, but the odds are high.
I think it's dangerous to talk about "odds" here. It all depends on the use, doesn't it? A program which never frees any memory, obviously has smaller odds of such failure than a program which frequently allocates and frees memory (in particular, if it always of the same size, as it is in lists).
If you're willing to accept such "odds" in certain programs, you can do this, of course. That's why I suggested a user-routine for pointer validity checking.
My other point is that most of the pointer checks are done when passing or storing the pointer in the first place, not while dereferencing it. Pointers that the compiler/runtime makes for VAR params etc. are known valid, provided any indexes off them are properly checked. Heap pointers come from something checked at creation, so the only worries should be dereferencing NIL and stale pointers. NIL is a fixed bit pattern. The nmalloc techniques can detect most of the stale items.
Which means that we do need this check at dereferencing time (as they may become stale between assignment and dereferencing).
Frank
-- Frank Heckenbach, frank@g-n-u.de, http://fjf.gnu.de/, 7977168E GPC To-Do list, latest features, fixed bugs: http://www.gnu-pascal.de/todo.html GPC download signing key: ACB3 79B2 7EB2 B7A7 EFDE D101 CD02 4C9D 0FE0 E5E8
- ----------------------------------------------------------- Frank D. Engel, Jr. fde101@fjrhome.net
$ ln -s /usr/share/kjvbible /usr/manual $ true | cat /usr/manual | grep "John 3:16" John 3:16 For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life. $
___________________________________________________________ $0 Web Hosting with up to 120MB web space, 1000 MB Transfer 10 Personalized POP and Web E-mail Accounts, and much more. Signup at www.doteasy.com
Frank D. Engel, Jr. wrote:
TYPE PtrRec = RECORD realLocation : INTEGER; { the real memory address } refCount : INTEGER { the number of pointers pointing to this record } END;
POINTER = ^PtrRec;
Now when assigning a pointer:
if ptr is not nil then dec(ptr^.refCount); ptr := newAddress; if ptr is not nil then inc(ptr^.refCount);
When derefing:
if ptr^.realLocation = 0 then complain else readValue := ptr^.realLocation
etc, etc... when refCount reaches zero, you can reuse the record. You could turn the record into a linked list structure, and add to the end only when marching through the list during an allocation does not reveal a reusable record.
This adds O(1) to the deref, assignment, and dispose times for a pointer, but makes a slightly larger hit when allocating memory. Since the records can be reused after the refCount hits zero, the memory leak factor is reduced in intensity.
Not necessarily "slightly". O(n) can be large. It can make the difference between a fast and an almost unusable program. Though this particular O(n) can probably avoided -- if the only criterion for reusability is the size, just keeping different sizes in different lists should do it. AFAIK, that's what usual memory managers do as well, of course with the blocks explicitly `free'd. So you probably just as well `free' the memory when the reference count reaches zero and let the memory manager take care of the rest.
You would need to be careful to correctly reduce reference counts from local variables, from pointers inside deallocated record structures and arrays of pointers, arrays of records containing pointers, and so forth.
The coding for this would be somewhat complex, but it may help to achieve what you are calling for...
Again, "somewhat complex" is a "slight" understatement. It would affect all usages of pointers, including in external libraries, which for practical purposes means it's impossible.
Anyway, what you describe, is a form of garbage collection. This is a branch of science on its own. Many algorithms exist, each with their own advantages and disadvantages.
Frank
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
My point here was to use the reference counts to track dangling pointers. Once the number of dangling pointers reaches zero, there aren't any anymore; meanwhile, checking that count before accessing the memory location would provide instant and very efficient confirmation on whether or not the pointer is still valid.
On Mar 18, 2005, at 11:55 AM, Frank Heckenbach wrote:
Again, "somewhat complex" is a "slight" understatement. It would affect all usages of pointers, including in external libraries, which for practical purposes means it's impossible.
Anyway, what you describe, is a form of garbage collection. This is a branch of science on its own. Many algorithms exist, each with their own advantages and disadvantages.
- ----------------------------------------------------------- Frank D. Engel, Jr. fde101@fjrhome.net
$ ln -s /usr/share/kjvbible /usr/manual $ true | cat /usr/manual | grep "John 3:16" John 3:16 For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life. $
___________________________________________________________ $0 Web Hosting with up to 120MB web space, 1000 MB Transfer 10 Personalized POP and Web E-mail Accounts, and much more. Signup at www.doteasy.com
"Frank D. Engel, Jr." wrote:
My point here was to use the reference counts to track dangling pointers. Once the number of dangling pointers reaches zero, there aren't any anymore; meanwhile, checking that count before accessing the memory location would provide instant and very efficient confirmation on whether or not the pointer is still valid.
Please don't top-post.
Reference counting won't help without a complete handle system to process all accesses indirectly. This won't mesh very will with usage for VAR parameters.
The problem appears when an extra copy of a pointer has been made and the original freed or disposed. It is easy to mark the original NIL at this time. But with most implementations the copy is left pointing to somewhere within the heap. Later in the codes life that somewhere may have become somewhere within a completely unrelated object, which happened to reuse the memory that was freed. At this point there is no way to check anything about that original copy pointer, whatever is lying about the pointed location is random to it.
That is why nmalloc checks for some relationships that should apply. At some negative displacement should be a pointer to another block, which in turn should contain a pointer back. There should be another specific value marking this block as non-free. These CAN be satisfied by random data, but probably won't be.
The question is how much overhead to put on pointer dereference. In Pascal, unlike C, we can tell such from VAR dereference through the type system. There is a non-trivial problem involved in passing that knowledge through functional parameters. This can be handled by performing the check at the point of passage, and thenceforce treating as a VAR. Any such mechanism as outlined above depends on intimate knowledge of the actual heap allocation mechanism, and involves overhead. VAR dereference is generated directly by the compiler, and known correct (what is not necessarily known is the validity of an offset, or index, value).
CBFalconer wrote:
The question is how much overhead to put on pointer dereference. In Pascal, unlike C, we can tell such from VAR dereference through the type system. There is a non-trivial problem involved in passing that knowledge through functional parameters. This can be handled by performing the check at the point of passage, and thenceforce treating as a VAR. Any such mechanism as outlined above depends on intimate knowledge of the actual heap allocation mechanism, and involves overhead. VAR dereference is generated directly by the compiler, and known correct (what is not necessarily known is the validity of an offset, or index, value).
If the actual var parameter is a dereferenced pointer which was disposed meanwhile, even var dereferencing can fail. Of course, disposing a pointer while a reference exists is already invalid, so you could instead try to check at `Dispose' time. But AFAICS, this would require reference counting again, or something like this ...
Frank
Frank Heckenbach wrote:
CBFalconer wrote:
The question is how much overhead to put on pointer dereference. In Pascal, unlike C, we can tell such from VAR dereference through the type system. There is a non-trivial problem involved in passing that knowledge through functional parameters. This can be handled by performing the check at the point of passage, and thenceforce treating as a VAR.
... snip ...
VAR dereference is generated directly by the compiler, and known correct (what is not necessarily known is the validity of an offset, or index, value).
If the actual var parameter is a dereferenced pointer which was disposed meanwhile, even var dereferencing can fail. Of course, disposing a pointer while a reference exists is already invalid, so you could instead try to check at `Dispose' time. But AFAICS, this would require reference counting again, or something like this ...
TYPE poo = ...; VAR pooptr : ^poo;
PROCEDURE foo(VAR p : poo); VAR oldpvalue : poo; BEGIN oldpvalue := p; p.field := ....; dispose(p); (* Compile ERROR - p is not of pointer type *) dispose(pooptr); (* fouls further p use, but why do it *) ... END;
BEGIN new(pooptr); ... foo(pooptr^); ...
At the time of calling foo pooptr is being assigned to the foo local variable p, and can be verified, just as it would be if foo received a non-var parameter. From then on (i.e. within foo) there is no way for p to become invalid, except as a result of the action of foo (or routines foo calls). Yes, it is possible to beat the system, but you have to work at it. The overhead occurs at the call to foo. Use is unhampered.
Pascal is NOT a concurrent system. There are no threads, etc.
CBFalconer wrote:
Frank Heckenbach wrote:
CBFalconer wrote:
The question is how much overhead to put on pointer dereference. In Pascal, unlike C, we can tell such from VAR dereference through the type system. There is a non-trivial problem involved in passing that knowledge through functional parameters. This can be handled by performing the check at the point of passage, and thenceforce treating as a VAR.
... snip ...
VAR dereference is generated directly by the compiler, and known correct (what is not necessarily known is the validity of an offset, or index, value).
If the actual var parameter is a dereferenced pointer which was disposed meanwhile, even var dereferencing can fail. Of course, disposing a pointer while a reference exists is already invalid, so you could instead try to check at `Dispose' time. But AFAICS, this would require reference counting again, or something like this ...
TYPE poo = ...; VAR pooptr : ^poo;
PROCEDURE foo(VAR p : poo); VAR oldpvalue : poo; BEGIN oldpvalue := p; p.field := ....; dispose(p); (* Compile ERROR - p is not of pointer type *) dispose(pooptr); (* fouls further p use, but why do it *) ... END;
BEGIN new(pooptr); ... foo(pooptr^); ...
At the time of calling foo pooptr is being assigned to the foo local variable p, and can be verified, just as it would be if foo received a non-var parameter. From then on (i.e. within foo) there is no way for p to become invalid, except as a result of the action of foo (or routines foo calls).
And these routines could be forward declared, or called through procedural parameters, i.e. in general there's no way for a compiler to know what they do.
Yes, it is possible to beat the system, but you have to work at it.
Sorry, but what's your point? You previously claimed that var dereferencing is "known correct". Now you say it can be beaten (but you have to work at it), and "why do it" [beat it].
That's what I said. (One could say the same thing for most other checks -- why assign out-of-range values, why dereference nil pointers, why make type errors --, so why do we need any checks at all?)
Surely, there are various ways for checks that work most of the time (I'd call them heuristic checks). But a reliable check still seems very hard in general, AFAICS.
BTW, according to the standard the `dispose(pooptr)' is invalid even if foo doesn't use p afterwards.
Pascal is NOT a concurrent system. There are no threads, etc.
These problems don't depend on threads. Aliasing is sufficient for them to arise (as the examples show), and aliasing exists in Pascal.
Frank
Frank D. Engel, Jr. wrote:
My point here was to use the reference counts to track dangling pointers. Once the number of dangling pointers reaches zero, there aren't any anymore; meanwhile, checking that count before accessing the memory location would provide instant and very efficient confirmation on whether or not the pointer is still valid.
This particular check might be efficient, but the whole system with reference count increments/decrements is not so simple, and you always have to consider the whole system.
Also, ignoring `Dispose' and treating a pointer as valid as long as there are references to it merely transform a crash into a program logic error (still using memory which you claimed, by `Dispose', you wouldn't use anymore). GC or reference counting doesn't magically solve all problems.
Frank
The way I'm implementing user-defined pointer checking now is as follows:
program ValidatePointerDemo;
uses GPC;
{ Example pointer validation routine. Since it is used instead of nil checks, the first thing it should do is often to check for nil pointers.
Next, it checks whether the pointer lies within the allocated bounds. GPC's heap manager provides these bounds and checks them on `Dispose'. It doesn't check them on every pointer dereference as there might be other pointers around. If this doesn't happen in your program, you can use a check such as this.
Additionally, you can do any other checks you like here. If you find the pointer invalid, this procedure should not return. (After returning, the program continues by dereferencing the pointer.) } procedure MyValidatePointer (p: Pointer); begin if p = nil then NilPointerError else if (PtrCard (p) < PtrCard (HeapLow)) or (PtrCard (p) > PtrCard (HeapHigh)) then InvalidPointerError (p) end;
{ Create a invalid pointer to demonstrate the effect. } procedure Test; var c: ^Integer; begin c := Pointer (42); WriteLn (c^) end;
begin { Install the validator routine. } ValidatePointerPtr := @MyValidatePointer;
{ Run the test. } Test end.
In a real context, this would usually be done in a unit or module whose initializer could install the validator routine.
Compile with `--pointer-checking --pointer-checking-user-defined' (globally for all modules). Program code can locally turn off `pointer-checking' for critical code, without having to remeber whether `pointer-checking-user-defined' is on (which works as shown above) or off (which does a simple nil check).
Frank