Re: Buffer Overrun Prevention in GPC

28 Jan 2006

      Sorry for the delay, but I had regular obligations at my job :-(
Frank Heckenbach wrote:
...
Mirsad Todorovac wrote:
...
On Thu, 26 Jan 2006, Frank Heckenbach wrote:
...
Adriaan van Os wrote:
...
...
Is Pascal and namely GNU Pascal safer re: buffer overruns?
How much does runtime range checking help
See http://www.gnu-pascal.de/crystal/gpc/en/mail12961.html.
Thank you for this link, Adriaan,
(Of course, the answer of Mr. Waldek Hebisch is depending on entire
Did he reply by private mail? I don't see his mail in the archive.
I thought the
http://www.gnu-pascal.de/crystal/gpc/en/mail12961.html was written by him.
probably I did not read well?
...
...
run/time system being (re)compiled with range checking on IMHO. Otherwise
a runtime overrun could happen inside a system function when unchecked?)
Yes. But by default, the RTS is compiled with range-checking on.
(cecause this is GPC's default in general, so unless you disable
checking by a make option for the RTS).
Thank you for explaining that. This helps.
...
...
Additionally I am interested whether there is a plan to introduce NX bit 
support (or emulation), and about StackGuard or similar "canary" 
protection.
I suppose this means non-executable, for stack memory?
As you correctly concluded.
...
...
Does NX bit use depend on compiler support at all, or is it completelly
OS-supported/dependent? I reckon there ought to be something in RTS 
mmap()-ing stack area with PROT_EXEC disabled or am I wrong?
The stack is allocated (intially and grow) by the OS automatically,
so it's an OS issue. However, actually GPC is affected by it (more
precisely, those GCC backends that implement trampolines on the
stack, with those frontends (languages) that have local routines and
pointers/references to routines which includes, of course, Pascal).
There's actually code in several backends to disable PROT_EXEC for
stack areas where needed, as a work-around.
How do they implement PROT_EXEC disable w/o hardware NX bit, I wonder?
I've heard of emulations but I never understood how this works. Probably 
I should
do my homeworks ;-)
...
...
...
Yes. In particular, some holes are intentional, either for
compatibility with other dialects (pointer-arithmetic etc.), or for
performance reasons (possibility to turn off checks).
There are other holes, such as dangling pointers and using
unintialized variables, which GPC cannot detect at all yet. It might
do in the future, but implementing them will be very hard, so don't
hold your breath.
Thank you for your reply, Frank,
In fact, this came to my mind in few considerations too: I was thinking of 
new breed of pointers that would allow for memory map defragmentation. As 
said somewhere in theoretical books, after millions of seemingly random 
allocations/deallocations - memory looks completely fragmented, and wtih 
equally distributed holes and used spaces.
I'm no expert in this area, but I'd think this basically assumes a
dumb memory manager. A MM that distributes pages into chunks of
fixed size (say, powers of two, up to a certain limit) should do
much better, and that's what all modern MMs do, AFAIK. Above the
page size, fragementation doesn't matter much anyway, as it only
fragements the virtual mapping, i.e. holes can be unmapped.
Also, I'm not sure the situation is really typical. Often a big
number of allocations is deallocated in onw go (say, a list).
True, all you said stands. Yet, if a program should run really long and 
use heap (and that's
what is recommended, since heap overruns are meant to be less dangerous 
for security,
even though recent exploits use explicitly heap overruns on Windows 
AFAIK). I've seen
an article that explains heap overrun exploit in detail, and how it was 
made impossible by heap
randomizations in Windows XP SP2. But it is slightly off-topic, if we do 
not decide to
introduce a better heap allocator option to GPC RTS.
DO you know of an example of memory mapping holes being deallocated? In 
fact, statistically
most of the holes will be less than page a size, wouldn't they? So, IMHO 
they cannot be
deallocated w/o some kind of reorganization of the heap memory map, and 
this again is
not possible as we currently have no means (unlike Java I suppose) of 
knowing about
all the pointers that use particular part of heap and which need to be 
adjusted to reflect
change in memory map.
This requires indirect pointers. Then all pointers to certain memory 
area could share a
base pointer, and IMHO it is not a big complication nor speed delay 
compared to
enhancements we get!
For example, all base pointers could fit on few memory pages that would 
easily fit in
processor caches, and they would since often used. So, this form of 
indirection does
not appear to be costly in terms of raw performance. (Frankly, range 
checking is costly also,
isn't it?)
The cost of modifying compiler and all wrappers is something different, 
I agree.
The idea is still being conceived, so I'd probably better do more 
homework and study
Boehm garbage collector and similar works.
But the idea of memory compaction and de-fragmentation still seems very 
good to
me, even thought it looks like SF now. Probably the best would be to try 
to implement
a library instead change to compiler or RTS as a first step, right?
I understand the power-of-two heaps idea, but it still seems to me that 
there is a vast
space for defragmentation.
...
...
This is why I would like to propose "registered pointers" (maybe it came
subconsciously from Java?) and memory allocation library that would have a 
better heap use and corruption detection.
Registered pointers are close in semantics to pointers with ranges already 
intorduced on the list, but they should also allow defragmentation of the 
memory on-the-fly, compacting of free space and deallocating and returning 
ampped pages back to system (unlike current memory allocators that behave 
like my country's budget: they only grow and become less efficiently used 
with time ...).
Is that too difficult to implement?
Probably yes, because it affects everything. The code generation is
different, and all library routines must be recompiled this way.
Since this is not realistically possible for libc and other default
libraries used, this means you need a lot of wrappers. And when such
libraries can do de-/allocation themselves, this will also pose some
interesting problems.
I see this is a huge problem. This is why perhaps a library would be a 
better first step
to solve the problem. Essentially, this means two heaps, but I don't 
think that is a huge problem,
especially now that we are looking into bright 64-bit address space 
future :-)
...
And think about performance. Double indirection for every pointer
access doesn't really speed up the program. ;-)
Some of these issues have been discussed here ( 
http://www.javaworld.com/javaworld/jw-08-1996/jw-08-gc-p3.html ):
* Compacting collectors *
Garbage collectors of JVMs will likely have a strategy to combat heap 
fragmentation.
Two strategies commonly used by mark and sweep collectors are 
/compacting/ and
/copying/. Both of these approaches move objects on the fly to reduce 
heap fragmentation.
Compacting collectors slide live objects over free memory space toward 
one end of
the heap. In the process the other end of the heap becomes one large 
contiguous
free area. All references to the moved objects are updated to refer to 
the new location.
Updating references to moved objects is sometimes made simpler by adding 
a level
of indirection to object references. Instead of referring directly to 
objects on the
heap, object references refer to a table of object handles. The object 
handles refer
to the actual objects on the heap. When an object is moved, only the 
object handle
must be updated with the new location. All references to the object in 
the executing
program will still refer to the updated handle, which did not move. 
While this approach
simplifies the job of heap defragmentation, it adds a performance 
overhead to every object access.
* Copying collectors *
Copying garbage collectors move all live objects to a new area. As the 
objects are moved
to the new area, they are placed side by side, thus eliminating any free 
spaces that may
have separated them in the old area. The old area is then known to be 
all free space.
The advantage of this approach is that objects can be copied as they are 
discovered
by the traversal from the root nodes. There are no separate mark and 
sweep phases.
Objects are copied to the new area on the fly, and forwarding pointers 
are left in their
old locations. The forwarding pointers allow objects encountered later 
in the traversal
that refer to already copied objects to know the new location of the 
copied objects.
A common copying collector is called /stop and copy/. In this scheme, 
the heap is divided
into two regions. Only one of the two regions is used at any time. 
Objects are allocated
from one of the regions until all the space in that region has been 
exhausted. At that
point program execution is stopped and the heap is traversed. Live 
objects are
copied to the other region as they are encountered by the traversal. 
When the
stop and copy procedure is finished, program execution resumes. Memory will
be allocated from the new heap region until it too runs out of space. At 
that
point the program will once again be stopped. The heap will be traversed and
live objects will be copied back to the original region. The cost 
associated with
this approach is that twice as much memory is needed for a given amount 
of heap
space because only half of the available memory is used at any time.
(Sorry if this is too long of a paste)
What I had in mind in the beginning is essentially the third strategy: 
"registered pointers". The new type
of pointers would have to be registered, and heap manager would have to 
update all registered pointers
to reflect the heap area copy-and-move.
As it would be done on-the-fly, and all pointers would point to the same 
byte of data after move
is made, the defragmentation would be transparent to any program.
...
...
(I am sorry if this has been discussed already.)
To explain the idea more visibly, as we are all probably tired at 4 PM:
 Every pointer would be registered in a pointer table or list
 *depending on implementation*

 Pointers would be indirect, used through a mechanism that would allow
 transparent moving of memory fragment pointed to.

 A garbage collection mechanism would browse through all pointers and
 defragment memory, much like a file system, moving used memory towards
 beginning, free memory towards end of memory mapping, so it could be
 later munmap()-ed and returned to OS when certain LOW_WATERMARK or
 threshold is reached

 Unused fragments of memory that are not explicitly free()-ed could be
 deallocated automatically and an optional warning could be generated

The latter is possible already, e.g. using the Boehm-Demers-Weiser
conservative garbage collector, which can be plugged transparently
into any libc-allocation based program (including GPC programs).
OK, thanks for pointing that. I am trying to study the Boehm papers. 
Perhaps all I proposed has already been
done :-(
...
...
 Better heap use would allow for introducing "heap canary" technique
 that would eliminate another source of exploits

According to http://en.wikipedia.org/wiki/Stack_frame, canaries are
there to detect overflows in the first place.
It also describes a way to prevent such attacks to the return
address on the stack, by detecting them before the return
instruction (which may not be 100% foolproof, with longjmp etc., but
rather reliable), but I don't see how this directly translates to
the heap. Of course, you could check a canary before *every* access
to some heap allocated variable, but that would be quite some
overhead (perhaps only useful for debugging), and probably still not
easy to implement -- if you have, say, a var parameter or a pointer
parameter, how does the routine know if it has a canary (allocated
with one), or not (allocated from foreign code, or global or local
variables not on the heap).
IMHO, canary ought to be checked on free(). This would catch a number of 
errors, since most
common error alongside buffer overrun is probably of-by-one error in loop.
...
Apart from that, allocating with a canary seems to be independent of
indirect pointers or compiler internals, so you could probably do it
by writing your own allocation routines. You'd only have to arrage
for the checking to happen an strategic places or such ...
As for (mostly) debugging checks, there's also libefence which you
probably know (which also works by overloading the de-/allocation
routines without any compiler support).
Of course I know of it, but I haven't tried it yet.
...
...
 All could be done transparent, without changing existing code

You mean source code? (Otherwise I don't understand what you mean by
indirect pointers.) But changing object code is not easy when it
necessarily includes *all* libraries used. (Indirect pointers change
the interface, so you can't just recompile those libraries you want
to have extra chekcing in.)
Obviously, the source code. I haven't been very clear with the obvious 
fact that changing pointer
semantics and/or size into indirect pointer or pointer with upper and 
lower boundary would inevitably
imply at least recompilation of libraries, assuming that indirection 
would be implemented transparent
to existing source. And having in mind that GPC uses libc extensively,
this may not be feasible in terms of necessary wrappers.
...
...
That's the general idea, but I haven't tried implementation yet.
Compatible programs could simply be linked with new library and have the
advantage of better memory use, better heap integrity and possibly longer 
uptime without need to reboot system because of memory leaks because of 
fragmentation.
If that's the case (and I misunderstood your indirect pointers), go
ahead and implement it. You can hook malloc/free (see libefence or
the BDW collector as examples to start from). In fact, with dynamic
libraries, you might not even have to relink programs, if you use
LD_PRELOAD. And it would be independent of languages, as long as
they use malloc/free (which GPC does internally by default).
There's a proverb here in Croatia about "inventing hot water". To avoid 
this, I should better do a
homework on what Mr. Hans Boehm and other garbage collector writers have 
done.
...
...
I went far from original question, but this idea is tempting me for a long 
time, and GPC development team seems open enough :-) that I thought of 
requesting implementation or putting it on the wish list.
IMHO, I see it rather as a separate project. GPC is not really short
of features (existing and wishlist), and due to the considerations
above, it seems easily separable (unless you really want to add
compiler-supported checking which you don't seem to want, according
to the previous paragraph).
The problem is: why all those fancy mechanisms of heap protection as 
libefence are not used
more widely? Simply because they are not handy, and few people know of 
them. Having it
seamlessly distributed with compiler or as a language option might make 
people consider
using it.
The basic inspiration came from my studying of month-and.-half long 
virus invasion and recent
network and system intrusions I faced helplessly in last (sort of) six 
months as the system
administrator.
If the buffer overrun protection isn't elegant, seamless and nearly 
mandatory, programmers might
not use it I'm afraid.
Can we do something in this direction?
Mirsad
PS. I ope this doesn't get in HTML on the list, since this is first time 
I use Thunderbird.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

Re: Buffer Overrun Prevention in GPC