In message Pine.LNX.4.21.0011240232590.265-100000@rusty.russwhit.com Russ Whitaker writes:
Hi
suggestion: a script to take a snapshot of the /proc filesystem so you can find what's growing
take a look at dir /proc and cat /proc/<ID>/status or cat /proc/<ID>/maps
The application has been running for some 21 hours now, and I've been taking occasional snapshots both of status and maps.
status shows VmSize and VmRSS growing.
maps shows the end of the rwxp region that starts at 08171000 growing - can I deduce anything more from this? (for example is it likely to be Pascal heap space or something else?). Both status and maps snapshots follow.
Any further suggestions will be gratefully received ...
status ====== Name: webmake.e State: S (sleeping) Pid: 27006 PPid: 31555 Uid: 500 500 500 500 Gid: 500 500 500 500 Groups: 500 VmSize: 110776 kB VmLck: 0 kB VmRSS: 108340 kB VmData: 107944 kB VmStk: 84 kB VmExe: 1172 kB VmLib: 1360 kB SigPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 800000000000a006 SigCgt: 0000000008094000 CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000
maps ==== 08048000-0816d000 r-xp 00000000 03:07 32300 /var/rally/bin/webmake.e 0816d000-08171000 rw-p 00124000 03:07 32300 /var/rally/bin/webmake.e 08171000-0e9d0000 rwxp 00000000 00:00 0 40000000-40013000 r-xp 00000000 03:05 34138 /lib/ld-2.1.3.so 40013000-40014000 rw-p 00012000 03:05 34138 /lib/ld-2.1.3.so 40014000-40015000 rw-p 00000000 00:00 0 40015000-40016000 r--p 00000000 03:06 94 /usr/share/locale/en_US/LC_MESSAGES/SYS_LC_MESSAGES 40016000-40017000 r--p 00000000 03:06 96660 /usr/share/locale/en_US/LC_MONETARY 40017000-40018000 r--p 00000000 03:06 96662 /usr/share/locale/en_US/LC_TIME 40018000-40019000 r--p 00000000 03:06 96661 /usr/share/locale/en_US/LC_NUMERIC 4001a000-4001b000 rwxs 00000000 00:00 0 4001b000-4001e000 r-xp 00000000 03:06 96925 /usr/lib/libpanel.so.4.0 4001e000-4001f000 rw-p 00002000 03:06 96925 /usr/lib/libpanel.so.4.0 4001f000-40054000 r-xp 00000000 03:06 96923 /usr/lib/libncurses.so.4.0 40054000-4005d000 rw-p 00034000 03:06 96923 /usr/lib/libncurses.so.4.0 4005d000-40061000 rw-p 00000000 00:00 0 40061000-4007d000 r-xp 00000000 03:05 34156 /lib/libm-2.1.3.so 4007d000-4007e000 rw-p 0001b000 03:05 34156 /lib/libm-2.1.3.so 4007e000-4016b000 r-xp 00000000 03:05 34145 /lib/libc-2.1.3.so 4016b000-4016f000 rw-p 000ec000 03:05 34145 /lib/libc-2.1.3.so 4016f000-40174000 rw-p 00000000 00:00 0 40174000-4017c000 r--p 00000000 03:06 96658 /usr/share/locale/en_US/LC_COLLATE 4017c000-40192000 r--p 00000000 03:06 96659 /usr/share/locale/en_US/LC_CTYPE 40192000-401cc000 rwxs 00000000 00:00 0 401cc000-40292000 rwxs 00000000 00:00 0 bffeb000-c0000000 rwxp fffec000 00:00 0
David James wrote:
In message Pine.LNX.4.21.0011240232590.265-100000@rusty.russwhit.com Russ Whitaker writes:
Hi
suggestion: a script to take a snapshot of the /proc filesystem so you can find what's growing
take a look at dir /proc and cat /proc/<ID>/status or cat /proc/<ID>/maps
The application has been running for some 21 hours now, and I've been taking occasional snapshots both of status and maps.
status shows VmSize and VmRSS growing.
maps shows the end of the rwxp region that starts at 08171000 growing - can I deduce anything more from this? (for example is it likely to be Pascal heap space or something else?).
Yes, most likely heap.
Any further suggestions will be gratefully received ...
As I said, tracing the caller addresses for Pascal heap allocations seems possible (of course, at the cost of a slightly increased run time and memory usage). If you're interested, let me know...
Frank
On Sat, 25 Nov 2000, David James wrote:
The application has been running for some 21 hours now, and I've been taking occasional snapshots both of status and maps.
status shows VmSize and VmRSS growing.
status
Name: webmake.e
Now that you've identified a program that grows, suggest looking for something simple: like an object created by new(), perhaps later saved to disk, but does not have a coresponding dispose() to remove it from memory.
Shot in the dark, but it's an easy mistake to make.
Hope it helps Russ
On Sun, 26 Nov 2000, Russ Whitaker wrote:
On Sat, 25 Nov 2000, David James wrote:
The application has been running for some 21 hours now, and I've been taking occasional snapshots both of status and maps.
status shows VmSize and VmRSS growing.
status
Name: webmake.e
Now that you've identified a program that grows, suggest looking for something simple: like an object created by new(), perhaps later saved to disk, but does not have a coresponding dispose() to remove it from memory.
Shot in the dark, but it's an easy mistake to make.
Another shot in the dark: up to now we can't exclude a garbage collection problem inside the memory management behind new and dispose. Under certain conditions pieces of memory freed with dispose may not be reusable for (even slightly) bigger ones. Suitable new and dispose sequences hence may produce lots of unusable memory blocks, which would eat up all available memory.
Ernst-Ludwig
Hope it helps Russ
In article Pine.HPX.4.02.10011271431440.19419-100000@dirac.desy.de, Ernst-Ludwig Bohnen bohnen@mail.desy.de writes
Another shot in the dark: up to now we can't exclude a garbage collection problem inside the memory management behind new and dispose. Under certain conditions pieces of memory freed with dispose may not be reusable for (even slightly) bigger ones. Suitable new and dispose sequences hence may produce lots of unusable memory blocks, which would eat up all available memory.
Can you be any more specific about what conditions might apply. The worst affected program starts off using ~5Mb of RAM and after 24 hours can be using more than 100Mb. Fortunately the server has 768 Mb RAM so the problem is manageable, although not pleasant.
On Mon, 27 Nov 2000, Martin Liddle wrote:
In article Pine.HPX.4.02.10011271431440.19419-100000@dirac.desy.de, Ernst-Ludwig Bohnen bohnen@mail.desy.de writes
Another shot in the dark: up to now we can't exclude a garbage collection problem inside the memory management behind new and dispose. Under certain conditions pieces of memory freed with dispose may not be reusable for (even slightly) bigger ones. Suitable new and dispose sequences hence may produce lots of unusable memory blocks, which would eat up all available memory.
Can you be any more specific about what conditions might apply. The worst affected program starts off using ~5Mb of RAM and after 24 hours can be using more than 100Mb. Fortunately the server has 768 Mb RAM so the problem is manageable, although not pleasant.
A special sequence of new() and dispose() may produce memory fragmentation. The example below doesn't take into account additional bytes needed for pointers or byte counts, etc, and ignores modulo 4 effects:
1) Get 1000 records each with length of 1000 bytes numbered from 0 to 999. We use 1,000,000 bytes taken from the heap.
2) Now free all odd records and then get another 1000 records each with length of 1001 bytes. Now we use 500*1000+1000*1001= 1501000 bytes, but really took 1000*1000+1000*1001=2001000 bytes from heap because the old odd 1000 byte records can't be reused for 1001 byte records.
Assuming a complex get/put application may break the heap into unusable small memory fragments this way and calling new() will eat up fresh memory with sufficient size.
But the numbers you mention above do not indicate to me that this story causes the problem. Continuous memory leakage proportional with time is more probably caused by missing dispose calls, for instance forgotten to dispose daughter records before disposing their mother record, or ...
Ernst-Ludwig
-- Martin Liddle, Tynemouth Computer Services, 27 Garforth Close, Cramlington, Northumberland, England, NE23 6EW. Phone: 01670-712624. Fax: 01670-717324.
In article Pine.HPX.4.02.10011281537040.14830-100000@dirac.desy.de, Ernst-Ludwig Bohnen bohnen@mail.desy.de writes
A special sequence of new() and dispose() may produce memory fragmentation. The example below doesn't take into account additional bytes needed for pointers or byte counts, etc, and ignores modulo 4 effects:
Get 1000 records each with length of 1000 bytes numbered from 0 to 999. We use 1,000,000 bytes taken from the heap.
Now free all odd records and then get another 1000 records each with length of 1001 bytes. Now we use 500*1000+1000*1001= 1501000 bytes, but really took 1000*1000+1000*1001=2001000 bytes from heap because the old odd 1000 byte records can't be reused for 1001 byte records.
Assuming a complex get/put application may break the heap into unusable small memory fragments this way and calling new() will eat up fresh memory with sufficient size.
Thanks. I now understand what you are saying.
But the numbers you mention above do not indicate to me that this story causes the problem. Continuous memory leakage proportional with time is more probably caused by missing dispose calls, for instance forgotten to dispose daughter records before disposing their mother record, or ...
We have looked for such sequences and haven't found anything. This doesn't mean that our code is correct. This is a large application with a lot of legacy code, some of it dating back more than 20 years and having been ported between several operating systems, compilers and processors. Now that our most urgent deadline has passed we will have a little more time to look at the debugging data we have gathered and to follow Frank's suggestion to upgrade to the latest snapshot of the compiler. Thank you for your interest in our problem.
Martin Liddle wrote:
Now that our most urgent deadline has passed we will have a little more time to look at the debugging data we have gathered and to follow Frank's suggestion to upgrade to the latest snapshot of the compiler. Thank you for your interest in our problem.
That's not exactly what I said. I said I could try to add support for this in the RTS if you like, and you'd have to upgrade *then*...
Frank
In article 4F1A9BDB.20001129032709.FOO-2554.frank@g-n-u.de, Frank Heckenbach frank@g-n-u.de writes
Martin Liddle wrote:
Now that our most urgent deadline has passed we will have a little more time to look at the debugging data we have gathered and to follow Frank's suggestion to upgrade to the latest snapshot of the compiler. Thank you for your interest in our problem.
That's not exactly what I said. I said I could try to add support for this in the RTS if you like, and you'd have to upgrade *then*...
OK. I accept that but you have also previously suggested upgrading to fix the recurrent problem we see with random failure to initialise null sets. I have got the two things confused. I worked over 140 hours last week and my brain is a bit addled. Sorry.
Once upon a time, Martin Liddle wrote:
In article 4F1A9BDB.20001129032709.FOO-2554.frank@g-n-u.de, Frank Heckenbach frank@g-n-u.de writes
Martin Liddle wrote:
Now that our most urgent deadline has passed we will have a little more time to look at the debugging data we have gathered and to follow Frank's suggestion to upgrade to the latest snapshot of the compiler. Thank you for your interest in our problem.
That's not exactly what I said. I said I could try to add support for this in the RTS if you like, and you'd have to upgrade *then*...
OK. I accept that but you have also previously suggested upgrading to fix the recurrent problem we see with random failure to initialise null sets.
I'm now adding somewhat improved support to trace memory leaks (will be uploaded soon).
Simply using the unit HeapMon should show some report about non-released pointers at the end of the program. The unit also provides a function to generate such a report at an arbitrary point during a program run and to an arbitrary file.
The report contains the caller address (which can be turned to source lines using addr2line) which might help find the leaks.
I suppose this is not currently am important thing to you, but when it will become a problem again (to you or anyone else), this should now help debugging.
Frank
In article 1B07E47F.20010506131757.FOO-5823.frank@g-n-u.de, Frank Heckenbach frank@g-n-u.de writes
I'm now adding somewhat improved support to trace memory leaks (will be uploaded soon).
Simply using the unit HeapMon should show some report about non-released pointers at the end of the program. The unit also provides a function to generate such a report at an arbitrary point during a program run and to an arbitrary file.
The report contains the caller address (which can be turned to source lines using addr2line) which might help find the leaks.
I suppose this is not currently am important thing to you, but when it will become a problem again (to you or anyone else), this should now help debugging.
We haven't yet located the problem so this will be very useful. Thank you very much.
Martin Liddle wrote:
I suppose this is not currently am important thing to you, but when it will become a problem again (to you or anyone else), this should now help debugging.
We haven't yet located the problem so this will be very useful. Thank you very much.
OK (but you'll have to wait for the next update, perhaps today, or in the next few days).
I'm also adding a shell script (gpc-run) that calls addr2line to translate the addresses into line numbers (provided the program was compiled with debug info).
Frank