CBFalconer wrote:
Frank Heckenbach wrote:
Obviously, you need pointers to represent such structures. Creating them is not a problem, but how to clean up properly? You can traverse all pointers in the object and clean up the objects they point to recursively. But that's wrong because those objects might be referred to from elsewhere. Also, with cyclic references, you'd run into an endless loop.
This is not to say it's impossible (I mentioned reference counting, and there are ways to avoid the endless loop), but is this much simpler than GC? In fact, if you consider the whole memory as a large tree(*), the resulting algorithm wouldn't look much differently from a GC algorithm ...
(*) It's not really so, but in fact most of it is consumed by tree nodes, and whether you consider cleaning up one of several trees or cutting a branch from the only tree doesn't make a big difference.
If this inter-dependence with GCC also means that garbage collection is a requirement, then "so be it" ...
Practically, it does. But if you know a better solution, even if only in theory, I'd be interested, just for curiosity.
How about in practice. I wrote the attached package for precisely this sort of problem. It is not completely tested yet, but I have found very few problems, and the whole package represents about a kilobyte of object code.
So far the best usage example is the 'markov' program, which uses two inter-related data bases. There is no need to keep track of individual items, only the data base itself. The inner workings are hidden. Read the header file, which provides the complete abstraction.
I envision each scope opening a new data base, and scope exit releasing it. Access is fast, generally requiring about 1.5 probes for finding, 2 for insertion. The package statistics include the cost of reorganization, which is more efficient than insertion, makes the database size irrelevant, and is again hidden from view. There is no problem including usage counts, possibly in a master spellings table, and interconnecting such with table construction/destruction (see the dupe/undupe functions).
For large numbers of items the only practical difficulty is the slow operation of free() in some C systems. This becomes noticeable in DJGPP by about 10,000 items, and is an O(N*N) effect, related to merging of freed storage blocks.
Yes, it seems to happen only with strong fragmentation as in the following test program (without fragmentation it's quite linear).
#include <stdlib.h> #include <stdio.h> #include <sys/time.h>
void *p [200000];
int main () { int i, j; clock_t c = clock (), c2; for (i = 1; i <= 20; i++) { for (j = 0; j < 10000 * i; j++) p[j] = malloc (42); for (j = 0; j < 10000 * i; j += 2) free (p[j]); for (j = 1; j < 10000 * i; j += 2) free (p[j]); c2 = clock (); printf ("%i ", (int) (c2 - c)); fflush (stdout); c = c2; } return 0; }
This probably will not affect compiler use, since 10,000 items per scope is a rare occurence IMNSHO.
Probably. OTOH, I've just fixed some other O(n^2) problems (in the code that deals with unit/module) interfaces. The Chief had reported (long time ago) that the compilation of Windows API interfaces was very slow, and that might have been due to these problems (as soon as the next RC is release, he'll certainly test it, I guess). I don't know how many items there actually are, but I wouldn't be surprised if it actually reaches this order of magnitude (taking into account that e.g., each function parameter, its type and its name, use distinct nodes internally).
I am attaching the hashlib (preliminary) package only to you, since it is preliminary and until I have finished with testing I want to keep control of it, and the newlist should not be burdened with a fairly large attachment. Feel free to pass it on to other workers in the Pascal pits, and/or to forward this message to the newslist.
... which I'm doing now.
Since I'm quite busy I haven't had the time yet to look at it thoroughly. But at first sight, if I understand it correctly, one would start a new "block" for each Pascal routine (roughly), do all local allocations within it, and release it as a whole in the end. (Perhaps this understanding is completely wrong, then ignore the rest. ;-)
Well, this is almost exactly the obstack idea. (Except that they kind of "buffer" the allocation and allocate and free larger blocks from the system MM, so the slowness of free() on DJGPP probably doesn't matter so much.)
But problems occur when the usage pattern of the tree nodes is not strictly "scoped". This starts with parameters (which are declared within a function, but needed when calling it -- though perhaps you can just put them in the outer scope), goes on with inline routines (whose code is kept in an intermediary form ("RTX") while most of the tree nodes can be freed, but not all, since, e.g. some type information is still needed), and the most difficult (or ugly) problems are often seemingly little details. E.g., a node that represents a type contains a list of its "variants" (i.e., the same type but, e.g., a protected and/or restricted version). Perhaps that's only for efficiency (so that when `protected foo' appears in several places, only one node for the restricted type needs to be created). But for this to work, a new variant of a type is always placed on the same obstack (in this case), otherwise a pointer would dangle when another obstack is removed.
These were just some examples (and perhaps not well described). The main point is that the more optimization and special features come into play, the more often one gets in conflict with the strict scoping that the obstacks imply by default. I suppose that's the reason why they were finally thrown out in gcc-3.0.
Frank