Sorry for the delay, but I had regular obligations at my job :-(
Frank Heckenbach wrote:
Mirsad Todorovac wrote:
On Thu, 26 Jan 2006, Frank Heckenbach wrote:
Adriaan van Os wrote:
Is Pascal and namely GNU Pascal safer re: buffer overruns? How much does runtime range checking help
Thank you for this link, Adriaan,
(Of course, the answer of Mr. Waldek Hebisch is depending on entire
Did he reply by private mail? I don't see his mail in the archive.
I thought the
http://www.gnu-pascal.de/crystal/gpc/en/mail12961.html was written by him. probably I did not read well?
run/time system being (re)compiled with range checking on IMHO. Otherwise a runtime overrun could happen inside a system function when unchecked?)
Yes. But by default, the RTS is compiled with range-checking on. (cecause this is GPC's default in general, so unless you disable checking by a make option for the RTS).
Thank you for explaining that. This helps.
Additionally I am interested whether there is a plan to introduce NX bit support (or emulation), and about StackGuard or similar "canary" protection.
I suppose this means non-executable, for stack memory?
As you correctly concluded.
Does NX bit use depend on compiler support at all, or is it completelly OS-supported/dependent? I reckon there ought to be something in RTS mmap()-ing stack area with PROT_EXEC disabled or am I wrong?
The stack is allocated (intially and grow) by the OS automatically, so it's an OS issue. However, actually GPC is affected by it (more precisely, those GCC backends that implement trampolines on the stack, with those frontends (languages) that have local routines and pointers/references to routines which includes, of course, Pascal). There's actually code in several backends to disable PROT_EXEC for stack areas where needed, as a work-around.
How do they implement PROT_EXEC disable w/o hardware NX bit, I wonder? I've heard of emulations but I never understood how this works. Probably I should do my homeworks ;-)
Yes. In particular, some holes are intentional, either for compatibility with other dialects (pointer-arithmetic etc.), or for performance reasons (possibility to turn off checks).
There are other holes, such as dangling pointers and using unintialized variables, which GPC cannot detect at all yet. It might do in the future, but implementing them will be very hard, so don't hold your breath.
Thank you for your reply, Frank,
In fact, this came to my mind in few considerations too: I was thinking of new breed of pointers that would allow for memory map defragmentation. As said somewhere in theoretical books, after millions of seemingly random allocations/deallocations - memory looks completely fragmented, and wtih equally distributed holes and used spaces.
I'm no expert in this area, but I'd think this basically assumes a dumb memory manager. A MM that distributes pages into chunks of fixed size (say, powers of two, up to a certain limit) should do much better, and that's what all modern MMs do, AFAIK. Above the page size, fragementation doesn't matter much anyway, as it only fragements the virtual mapping, i.e. holes can be unmapped.
Also, I'm not sure the situation is really typical. Often a big number of allocations is deallocated in onw go (say, a list).
True, all you said stands. Yet, if a program should run really long and use heap (and that's what is recommended, since heap overruns are meant to be less dangerous for security, even though recent exploits use explicitly heap overruns on Windows AFAIK). I've seen an article that explains heap overrun exploit in detail, and how it was made impossible by heap randomizations in Windows XP SP2. But it is slightly off-topic, if we do not decide to introduce a better heap allocator option to GPC RTS.
DO you know of an example of memory mapping holes being deallocated? In fact, statistically most of the holes will be less than page a size, wouldn't they? So, IMHO they cannot be deallocated w/o some kind of reorganization of the heap memory map, and this again is not possible as we currently have no means (unlike Java I suppose) of knowing about all the pointers that use particular part of heap and which need to be adjusted to reflect change in memory map.
This requires indirect pointers. Then all pointers to certain memory area could share a base pointer, and IMHO it is not a big complication nor speed delay compared to enhancements we get!
For example, all base pointers could fit on few memory pages that would easily fit in processor caches, and they would since often used. So, this form of indirection does not appear to be costly in terms of raw performance. (Frankly, range checking is costly also, isn't it?)
The cost of modifying compiler and all wrappers is something different, I agree.
The idea is still being conceived, so I'd probably better do more homework and study Boehm garbage collector and similar works.
But the idea of memory compaction and de-fragmentation still seems very good to me, even thought it looks like SF now. Probably the best would be to try to implement a library instead change to compiler or RTS as a first step, right?
I understand the power-of-two heaps idea, but it still seems to me that there is a vast space for defragmentation.
This is why I would like to propose "registered pointers" (maybe it came subconsciously from Java?) and memory allocation library that would have a better heap use and corruption detection.
Registered pointers are close in semantics to pointers with ranges already intorduced on the list, but they should also allow defragmentation of the memory on-the-fly, compacting of free space and deallocating and returning ampped pages back to system (unlike current memory allocators that behave like my country's budget: they only grow and become less efficiently used with time ...).
Is that too difficult to implement?
Probably yes, because it affects everything. The code generation is different, and all library routines must be recompiled this way. Since this is not realistically possible for libc and other default libraries used, this means you need a lot of wrappers. And when such libraries can do de-/allocation themselves, this will also pose some interesting problems.
I see this is a huge problem. This is why perhaps a library would be a better first step to solve the problem. Essentially, this means two heaps, but I don't think that is a huge problem, especially now that we are looking into bright 64-bit address space future :-)
And think about performance. Double indirection for every pointer access doesn't really speed up the program. ;-)
Some of these issues have been discussed here ( http://www.javaworld.com/javaworld/jw-08-1996/jw-08-gc-p3.html ):
* Compacting collectors * Garbage collectors of JVMs will likely have a strategy to combat heap fragmentation. Two strategies commonly used by mark and sweep collectors are /compacting/ and /copying/. Both of these approaches move objects on the fly to reduce heap fragmentation. Compacting collectors slide live objects over free memory space toward one end of the heap. In the process the other end of the heap becomes one large contiguous free area. All references to the moved objects are updated to refer to the new location.
Updating references to moved objects is sometimes made simpler by adding a level of indirection to object references. Instead of referring directly to objects on the heap, object references refer to a table of object handles. The object handles refer to the actual objects on the heap. When an object is moved, only the object handle must be updated with the new location. All references to the object in the executing program will still refer to the updated handle, which did not move. While this approach simplifies the job of heap defragmentation, it adds a performance overhead to every object access.
* Copying collectors * Copying garbage collectors move all live objects to a new area. As the objects are moved to the new area, they are placed side by side, thus eliminating any free spaces that may have separated them in the old area. The old area is then known to be all free space. The advantage of this approach is that objects can be copied as they are discovered by the traversal from the root nodes. There are no separate mark and sweep phases. Objects are copied to the new area on the fly, and forwarding pointers are left in their old locations. The forwarding pointers allow objects encountered later in the traversal that refer to already copied objects to know the new location of the copied objects.
A common copying collector is called /stop and copy/. In this scheme, the heap is divided into two regions. Only one of the two regions is used at any time. Objects are allocated from one of the regions until all the space in that region has been exhausted. At that point program execution is stopped and the heap is traversed. Live objects are copied to the other region as they are encountered by the traversal. When the stop and copy procedure is finished, program execution resumes. Memory will be allocated from the new heap region until it too runs out of space. At that point the program will once again be stopped. The heap will be traversed and live objects will be copied back to the original region. The cost associated with this approach is that twice as much memory is needed for a given amount of heap space because only half of the available memory is used at any time.
(Sorry if this is too long of a paste)
What I had in mind in the beginning is essentially the third strategy: "registered pointers". The new type of pointers would have to be registered, and heap manager would have to update all registered pointers to reflect the heap area copy-and-move.
As it would be done on-the-fly, and all pointers would point to the same byte of data after move is made, the defragmentation would be transparent to any program.
(I am sorry if this has been discussed already.)
To explain the idea more visibly, as we are all probably tired at 4 PM:
Every pointer would be registered in a pointer table or list *depending on implementation* Pointers would be indirect, used through a mechanism that would allow transparent moving of memory fragment pointed to. A garbage collection mechanism would browse through all pointers and defragment memory, much like a file system, moving used memory towards beginning, free memory towards end of memory mapping, so it could be later munmap()-ed and returned to OS when certain LOW_WATERMARK or threshold is reached Unused fragments of memory that are not explicitly free()-ed could be deallocated automatically and an optional warning could be generated
The latter is possible already, e.g. using the Boehm-Demers-Weiser conservative garbage collector, which can be plugged transparently into any libc-allocation based program (including GPC programs).
OK, thanks for pointing that. I am trying to study the Boehm papers. Perhaps all I proposed has already been done :-(
Better heap use would allow for introducing "heap canary" technique that would eliminate another source of exploits
According to http://en.wikipedia.org/wiki/Stack_frame, canaries are there to detect overflows in the first place.
It also describes a way to prevent such attacks to the return address on the stack, by detecting them before the return instruction (which may not be 100% foolproof, with longjmp etc., but rather reliable), but I don't see how this directly translates to the heap. Of course, you could check a canary before *every* access to some heap allocated variable, but that would be quite some overhead (perhaps only useful for debugging), and probably still not easy to implement -- if you have, say, a var parameter or a pointer parameter, how does the routine know if it has a canary (allocated with one), or not (allocated from foreign code, or global or local variables not on the heap).
IMHO, canary ought to be checked on free(). This would catch a number of errors, since most common error alongside buffer overrun is probably of-by-one error in loop.
Apart from that, allocating with a canary seems to be independent of indirect pointers or compiler internals, so you could probably do it by writing your own allocation routines. You'd only have to arrage for the checking to happen an strategic places or such ...
As for (mostly) debugging checks, there's also libefence which you probably know (which also works by overloading the de-/allocation routines without any compiler support).
Of course I know of it, but I haven't tried it yet.
All could be done transparent, without changing existing code
You mean source code? (Otherwise I don't understand what you mean by indirect pointers.) But changing object code is not easy when it necessarily includes *all* libraries used. (Indirect pointers change the interface, so you can't just recompile those libraries you want to have extra chekcing in.)
Obviously, the source code. I haven't been very clear with the obvious fact that changing pointer semantics and/or size into indirect pointer or pointer with upper and lower boundary would inevitably imply at least recompilation of libraries, assuming that indirection would be implemented transparent to existing source. And having in mind that GPC uses libc extensively, this may not be feasible in terms of necessary wrappers.
That's the general idea, but I haven't tried implementation yet.
Compatible programs could simply be linked with new library and have the advantage of better memory use, better heap integrity and possibly longer uptime without need to reboot system because of memory leaks because of fragmentation.
If that's the case (and I misunderstood your indirect pointers), go ahead and implement it. You can hook malloc/free (see libefence or the BDW collector as examples to start from). In fact, with dynamic libraries, you might not even have to relink programs, if you use LD_PRELOAD. And it would be independent of languages, as long as they use malloc/free (which GPC does internally by default).
There's a proverb here in Croatia about "inventing hot water". To avoid this, I should better do a homework on what Mr. Hans Boehm and other garbage collector writers have done.
I went far from original question, but this idea is tempting me for a long time, and GPC development team seems open enough :-) that I thought of requesting implementation or putting it on the wish list.
IMHO, I see it rather as a separate project. GPC is not really short of features (existing and wishlist), and due to the considerations above, it seems easily separable (unless you really want to add compiler-supported checking which you don't seem to want, according to the previous paragraph).
The problem is: why all those fancy mechanisms of heap protection as libefence are not used more widely? Simply because they are not handy, and few people know of them. Having it seamlessly distributed with compiler or as a language option might make people consider using it.
The basic inspiration came from my studying of month-and.-half long virus invasion and recent network and system intrusions I faced helplessly in last (sort of) six months as the system administrator.
If the buffer overrun protection isn't elegant, seamless and nearly mandatory, programmers might not use it I'm afraid.
Can we do something in this direction?
Mirsad
PS. I ope this doesn't get in HTML on the list, since this is first time I use Thunderbird.