At 12:05 PM +0200 12/8/03, Adriaan van Os wrote:
Perhaps the solution to this problem would be for each ICE to either print a stack backtrace, or if that isn't possible, for each ICE to display the source file and line number in GPC that generates this? At least then Frank would be able to figure out where to look?
No problem to provide the backtrace and the offending source line, but I doubt whether that will be helpful. If my understanding of "random" problems like the above (not Frank Engel's) is correct, the logic at the time of the crash isn't at fault, but some internal or external data structure has gone corrupt at an earlier time.
These are nasty problems. To find the cause you have to make the "random" problem more reproducible, more crashable.
True enough, but at least knowing what it is accessing at the time might help - especially if you start seeing some consistencies in the reports.
One way to do that is to use stress testing tools, which is why I were experimenting with the --param option (to stress test the gnu garbage collector). Also, I will see if I can build gpc with Apple's debugging version of malloc. But -- it could even be something as bizar as a bug in Mac OS X's disk cache system (I am just speculating). Frank mentioned hardware failures.
Absolutely - especially as I have not see anything like this in all the compiling I've done on both my old G4 and my new dual G4. However, once you start seeing it on more than a couple Mac OS X users, hardware would become fairly unlikely given the relatively small number of GPC/Mac OS X users currently. However, a system software error is still a definite possibility in this case, especially given the relative newness of Mac OS X.
The other way is to build into the software as much as possible internal checks. That can make it run slower (or even painfully slow), but it would be a special runtime or build option.
Yep - AssertDataStructureValid. If the above traceback gave a hint as to which structure was the problem, it might then be possible to add a check for its validity regularly (say as much as each token, but perhaps just each statement), and that can then help trace the problem back.
But as you say, very ugly to figure out. Peter.