Please see below ...
Joe.
-----Original Message----- From: Russell Whitaker [SMTP:russ@ashlandhome.net] Sent: Wednesday, February 13, 2002 3:51 PM To: gpc@gnu.de Subject: Re: gcc-3+
On Tue, 12 Feb 2002, Markus Gerwinski wrote:
Hi folks,
Frank Heckenbach wrote (in reply to da Silva, Joe):
Obviously, you need pointers to represent such structures. Creating them is not a problem, but how to clean up properly? You can traverse all pointers in the object and clean up the objects they point to recursively. But that's wrong because those objects might be referred to from elsewhere. Also, with cyclic references, you'd run into an endless loop.
[Joe da Silva]
Yes, I seem to recall this matter being discussed in Pemberton and Martin, wrt. the P4 compiler, although I can't recall the details ( I'm trying to learn this stuff, but it's "heavy going" and spare time is scarce :-/ ).
This is not to say it's impossible (I mentioned reference counting, and there are ways to avoid the endless loop), but is this much simpler than GC? In fact, if you consider the whole memory as a large tree(*), the resulting algorithm wouldn't look much differently from a GC algorithm ...
If this inter-dependence with GCC also means that garbage collection is a requirement, then "so be it" ...
Practically, it does. But if you know a better solution, even if only in theory, I'd be interested, just for curiosity.
I'm currently working on a framework including an object type named "tReferenceManager". It is intended to exactly solve this problem, at
least
for synchronization of collections (lists, hashtables etc.): For every pointer, you can subscribe _one_ collection it is stored in, plus an arbitrary number of reference collections. When the pointer is removed
from
the store collection, it is automatically disposed, and all of the references are removed from the reference collections.
I hope to complete the respective unit of this framework within a few
weeks.
If anyone thinks he/she can use it, I'll announce it after this unit is ready. (Frank: The framework contains also the exception handling unit
we
already talked about.)
Personally, I _really_ dislike garbage collection since I had to deal
with
it in an optimization problem in Java. ;-/ IMO, providing a language
with
GC is just a way to encourage programmers to give up control. An
alternative
feature I'd like to see in a language would be to split pointers into
two
separate data types: "store" pointers and "reference" pointers.
Allocating
and deallocating may only be done on a store pointer; setting a pointer
to
an already existing address may only be done with a reference pointer.
When
disposing of a store pointer, itself and all of its references are automatically set to nil, so you can always check whether the pointer is pointing to valid data. (It's the same principle I'm using in my tReferenceManager, and up to now I didn't find any disadvantages in it. Except that you are forced to think before you code.;-)
[Joe da Silva]
An interesting idea - I like your approach.
Perhaps I'm missing something. I thought the purpose of GC was to maintain a list of free memory. Thus GC doesn't care how you access a block, it cares only if it is free or not. A simple GC would recognize adjacent free blocks and combine them into a larger single block.
[Joe da Silva]
Well, I though that's what "new" and "dispose" do. As I understand it, GC's purpose is to de-allocate memory blocks that no longer are referenced by pointers but that the programmer has forgotten to de-allocate himself (or herself). For example,
new(apointertosomething); ..... {no dispose} new(apointertosomething);
A more complex scheme could move already allocated blocks to make a large free block when one is needed. This last might be implemented by adding a transparent level of indirection. Programmer thinks his pointer p is pointing to something when in reality it is pointing to an internal pointer pointing to the something. Thus when the block gets moved only one pointer needs to be changed.
Along with GC you have the problem of do you allocate first fit or best fit? Each has its pros and cons.
And just how is gcc doing GC and friends? (was lazy and didn't do much digging thru the code.)
Just some thoughts, Russ
da Silva, Joe wrote:
Russell Whitaker wrote:
Perhaps I'm missing something. I thought the purpose of GC was to maintain a list of free memory. Thus GC doesn't care how you access a block, it cares only if it is free or not. A simple GC would recognize adjacent free blocks and combine them into a larger single block.
Well, I though that's what "new" and "dispose" do. As I understand it, GC's purpose is to de-allocate memory blocks that no longer are referenced by pointers but that the programmer has forgotten to de-allocate himself (or herself). For example,
new(apointertosomething); ..... {no dispose} new(apointertosomething);
I think Joe is right. Maintaining the pool of free memory is another (and I think much easier) step.
A more complex scheme could move already allocated blocks to make a large free block when one is needed. This last might be implemented by adding a transparent level of indirection. Programmer thinks his pointer p is pointing to something when in reality it is pointing to an internal pointer pointing to the something. Thus when the block gets moved only one pointer needs to be changed.
But you have the extra cost of an additional pointer (memory) and indirection (runtime).
However, I don't think that's a major issue for GCC at all since most of the allocations are done for "tree nodes" which have a small number of different constant sizes, so the storage of most of the deallocated nodes can later be re-used for nodes of exactly the same size.
Frank
On Wed, 13 Feb 2002, Frank Heckenbach wrote:
da Silva, Joe wrote:
Russell Whitaker wrote:
Perhaps I'm missing something. I thought the purpose of GC was to maintain a list of free memory. Thus GC doesn't care how you access a block, it cares only if it is free or not. A simple GC would recognize adjacent free blocks and combine them into a larger single block.
Well, I though that's what "new" and "dispose" do. As I understand it, GC's purpose is to de-allocate memory blocks that no longer are referenced by pointers but that the programmer has forgotten to de-allocate himself (or herself). For example,
new(apointertosomething); ..... {no dispose} new(apointertosomething);
That's a memory leak. Suppose you don't have a leak but you do have a simple memory allocation scheme where new always allocates from the end of the heap. Say the program allocates ten items, 1 thru 10, and then disposes nos. 7 and 10. Since 10 is at the end of the heap it gets added back to the heap but not 7 at this time because 8 and 9 are in the way. This is not a memory leak. Now if the program asks new for a block of memory it passes up 7 because its not at the end of the heap. ( I'm using "end of heap" here to mean "the end of the unallocated portion of the heap").
Well it works but you could design an application that stair-steps its way from one end of the heap to the other and crashes.
The next level of complexity is a linked list of free memory. Now new traverses the list looking for a suitable block. If what it finds is too big then it takes what it needs and leaves the rest on the list.
This gets rid of the stair-step problem. However if a program runs long enough there is a tendency to have an allocated bock, a free block, an allocated block, etc., which is a tad wasteful.
I think Joe is right. Maintaining the pool of free memory is another (and I think much easier) step.
A more complex scheme could move already allocated blocks to make a large free block when one is needed. This last might be implemented by adding a transparent level of indirection. Programmer thinks his pointer p is pointing to something when in reality it is pointing to an internal pointer pointing to the something. Thus when the block gets moved only one pointer needs to be changed.
But you have the extra cost of an additional pointer (memory) and indirection (runtime).
Very true. It looks like what you might gain isn't worth the added complexity.
However, I don't think that's a major issue for GCC at all since most of the allocations are done for "tree nodes" which have a small number of different constant sizes, so the storage of most of the deallocated nodes can later be re-used for nodes of exactly the same size.
True for gcc (& gpc) itself, most likely not true for the programs it creates. So is the GC intended for the compiler, the applications, or both?
Russ
Russell Whitaker wrote:
That's a memory leak. Suppose you don't have a leak but you do have a simple memory allocation scheme where new always allocates from the end of the heap. Say the program allocates ten items, 1 thru 10, and then disposes nos. 7 and 10. Since 10 is at the end of the heap it gets added back to the heap but not 7 at this time because 8 and 9 are in the way. This is not a memory leak. Now if the program asks new for a block of memory it passes up 7 because its not at the end of the heap. ( I'm using "end of heap" here to mean "the end of the unallocated portion of the heap").
Well it works but you could design an application that stair-steps its way from one end of the heap to the other and crashes.
The next level of complexity is a linked list of free memory. Now new traverses the list looking for a suitable block. If what it finds is too big then it takes what it needs and leaves the rest on the list.
This gets rid of the stair-step problem. However if a program runs long enough there is a tendency to have an allocated bock, a free block, an allocated block, etc., which is a tad wasteful.
Indeed -- and inefficient (in the worst case, it adds a factor of O(n) with n being the number of fragments). I remember that BP's memory manager worked this way and you could see the slowdown with a suitable test program.
However, modern memory managers (as found on most systems) are a little more clever. E.g., they sort the fragments by size to find a best match without long searching. Surely they use balanced trees rather than linked lists if necessary (i.e., if searching might occur). Another common technique is to divide a page of memory into fragments of a size that is a power of 2 and manage them using an allocation table much like a file system does (well, a modern file system, not exactly Dos FAT).
True for gcc (& gpc) itself, most likely not true for the programs it creates. So is the GC intended for the compiler, the applications, or both?
For the compiler itself. GPC applications by default use the normal libc memory manager, but there are in fact several ways to replace it. The GPC unit provides some hooks. Furthermore, the libc routines can be overriden, e.g. by alternative memory management libraries. Such libraries exist (some of them doing GC, some for other purposes), and generally they can be used in GPC programs by just linking them. But that's up to the programmer's choice, of course.
As I said, the fragmentation is not the main issue for GPC, but in fact the memory leaking problem. I.e., with such complex structures it's difficult to ensure that all memory is deallocated at the correct time, and if it turns out to be more difficult than doing a GC, it might be advantegeous to let the memory "leak" intentionally and let the GC clean up.
Frank