Marten Jan de Ruiter wrote:
to compile the program with profiling using the switch -pg. Because the program took an unusual amount of time running, I made a profile. Profiling the run yields that Preparedisposepointer uses exceedingly long times. This is the result for a fairly small problem, used to get output for this mail. I have seen 95% time use for Preparedisposepointer, but lost that profile testing other versions of gpc, and I am not patient enough to regenerate it, now that this output also shows the problem:
/hole 17 % gprof `which charles` | more Flat profile:
Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 59.13 3.14 3.14 Preparedisposepointer 3.20 3.31 0.17 _p_read_longreal 3.20 3.48 0.17 _p_trim ...
(sorry for the crappy lay-out)
Charles, compiled by an older version (see below) of gpc (seems to) works fine, but suffers from a memory leak in the file-io routines.
I was not able to locate the difference in the pascal sources that explains why Preparedisposepointer now needs so much time. However, I suspect FreeMemPtr^ (in rts/heap.pas) does some things. A colleague mentioned that it might be a lot of moving in memory to prevent fragmentation.
Allocated memory is never moved (this would break pointers pointing to it).
By default FreeMemPtr points to free(). I'm not sure if profiling includes subroutines (in particular, libc routines which probably were not compiled with `-pg'). If so, you could try to move the call into a subroutine to find out if free() or PrepareDisposePointer actually takes the time.
PrepareDisposePointer shouldn't do much unless `Mark' is used (anywhere in the program) which, e.g., the `HeapMon' unit does. If you use either of them, this probably explains it. `Mark' is not optimized for speed (I consider it mostly an outdated and/or debugging thing).
If that's an issue for you, feel free to improve it (perhaps use a hash table instead of the list etc., but note that nested `Mark's should work) -- it's Pascal code, so the usual argument for staying away from the internals doesn't count. ;-)
Now I have the following questions: Any idea why deallocation takes forever in the FE program as compared to allocation? Maybe the reason is that the data-structure is slightly more complicated than a linked list of integers ;-)
The data structure doesn't matter at all to the heap management. The size of the blocks might have some influence, but I don't suspect so here (if anything, smaller blocks (integers) should be slower -- if you compare the same absolute memory size, of course).
Frank