Hi, all!
I have recently studied several forms of viruses and security holes in software. Many if not 90% of recent exploits depend on holes introduced through buffer overruns, such as this C example:
printbuffer() { char buffer[100];
gets (buffer); /* oops!*/ fp = fopen("LPT1:", "w"); fputs (buffer, fp); }
Is Pascal and namely GNU Pascal safer re: buffer overruns? How much does runtime range checking help and to what extent can we depend on it? Is it acceptable to write setuid root programs in GPC and what are the cautions?
Thanks for answers.
Mirsad
"Tvrdim da bi se napetost izmedju znanosti i vjere trebala rijesiti njihovom sintezom, a ne odbacivanjem ili podvojenoscu."
Pierre Teilhard de Chardin (1881-1955)
Mirsad Todorovac wrote:
I have recently studied several forms of viruses and security holes in software. Many if not 90% of recent exploits depend on holes introduced through buffer overruns, such as this C example:
printbuffer() { char buffer[100];
gets (buffer); /* oops!*/ fp = fopen("LPT1:", "w"); fputs (buffer, fp);
}
Is Pascal and namely GNU Pascal safer re: buffer overruns? How much does runtime range checking help
See http://www.gnu-pascal.de/crystal/gpc/en/mail12961.html.
and to what extent can we depend on it?
The only thing you can depend on in programming is clear thinking. Don't trust anything that promises you automatic wonders.
Is it acceptable to write setuid root programs in GPC and what are the cautions?
From http://en.wikipedia.org/wiki/Setuid
"While the setuid feature is very useful in many cases, it can however pose a security risk if the setuid attribute is assigned to executable programs that are not carefully designed. Users can exploit vulnerabilities in flawed programs to gain permanent elevated privileges, or unintentionally execute a trojan horse program."
I think "carefully designed" is the keyword (and ask yourself if that applies to the C and C++ programming languages (...)).
Regards,
Adriaan van Os
Adriaan van Os wrote:
Mirsad Todorovac wrote:
I have recently studied several forms of viruses and security holes in software. Many if not 90% of recent exploits depend on holes introduced through buffer overruns, such as this C example:
printbuffer() { char buffer[100];
gets (buffer); /* oops!*/ fp = fopen("LPT1:", "w"); fputs (buffer, fp);
}
Is Pascal and namely GNU Pascal safer re: buffer overruns? How much does runtime range checking help
Yes. In particular, some holes are intentional, either for compatibility with other dialects (pointer-arithmetic etc.), or for performance reasons (possibility to turn off checks).
There are other holes, such as dangling pointers and using unintialized variables, which GPC cannot detect at all yet. It might do in the future, but implementing them will be very hard, so don't hold your breath.
and to what extent can we depend on it?
The only thing you can depend on in programming is clear thinking. Don't trust anything that promises you automatic wonders.
Yes. Though it might help thinking clearly when you don't have to worry about every little detail (such as basically every C-string operation, which can be dangerous if not careful; note, I said C-string, so using C-strings in Pascal is just as dangerous as in C, of course).
Is it acceptable to write setuid root programs in GPC and what are the cautions?
From http://en.wikipedia.org/wiki/Setuid
"While the setuid feature is very useful in many cases, it can however pose a security risk if the setuid attribute is assigned to executable programs that are not carefully designed. Users can exploit vulnerabilities in flawed programs to gain permanent elevated privileges, or unintentionally execute a trojan horse program."
I think "carefully designed" is the keyword (and ask yourself if that applies to the C and C++ programming languages (...)).
Indeed. But apart from the programming language, the same applies to your program as well. Even a perfectly secure programming language and environment don't help if your program, e.g., executes another program carelessly. No amount of string and buffer checking helps if the program allows the name of the program to be executed to be supplied by an attacker, say /bin/sh ...
As for the original question about buffer overruns (only), Pascal in general and GPC in particular should be quite a bit safer than C, *if* you turn on all checks and use only high-level features (this excludes, e.g., the address operator, type-casts, abuse of variant records, low-level routines such as GetMem, Move and FillChar, calls to unsafe external routines (assembler, C, or whatever), and probably much more).
Frank
Frank Heckenbach wrote:
Adriaan van Os wrote:
Mirsad Todorovac wrote:
I have recently studied several forms of viruses and security holes in software. Many if not 90% of recent exploits depend on holes introduced through buffer overruns, such as this C example:
printbuffer() { char buffer[100];
gets (buffer); /* oops!*/ fp = fopen("LPT1:", "w"); fputs (buffer, fp);
}
Is Pascal and namely GNU Pascal safer re: buffer overruns? How much does runtime range checking help
Yes. In particular, some holes are intentional, either for compatibility with other dialects (pointer-arithmetic etc.), or for performance reasons (possibility to turn off checks).
There are other holes, such as dangling pointers and using unintialized variables, which GPC cannot detect at all yet. It might do in the future, but implementing them will be very hard, so don't hold your breath.
I have always presumed that -Wuninitialized (combined with -On, where n
=1) does warn about uninitialized variables. The following simple test
seems to confirm that, but only for local variables, not for global variables (which maybe are automatically initialized to 0 ?? (a dangerous presumption by the way)).
program uninitialized;
var j: integer;
procedure P; var i: integer; begin i:= i + 1; writeln( 'i =', i) end;
begin P; writeln( 'j = ', j); end.
Regards,
Adriaan van Os
Adriaan van Os wrote:
There are other holes, such as dangling pointers and using unintialized variables, which GPC cannot detect at all yet. It might do in the future, but implementing them will be very hard, so don't hold your breath.
I have always presumed that -Wuninitialized (combined with -On, where n
=1) does warn about uninitialized variables. The following simple test
seems to confirm that, but only for local variables, not for global variables (which maybe are automatically initialized to 0 ?? (a dangerous presumption by the way)).
These are static (compile-time) checks, i.e., when a variable *might* be used unintialized. Checking when a variable actually is used uninitialized can only be done at runtime. Of course, catching too much shouldn't hurt (except for some spurious warnings, that you probably have seen yourself, which can be worked-around after careful verification).
However, the static checks only work for entire local variables. In particular they don't work for:
- global variables -- as you say they are automatically initialized to 0, but that's only a C heritage of the backend, and not guaranteed in Pascal, so actually there should be a warning.
- heap-allocated variables -- since not even the number of allocations is generally known at compile-time, they can't usually be statically checked
- variant record fields -- according to standard Pascal, all fields become undefined when changing the variant (by assignment to the selector if present, or if not present, by using some field of another variant). That's also generally only known at runtime.
- record/array fields -- currently GPC warns only for the whole structure. Though it would be possible (at least in principle) to extend it to record fields, and components of arrays with compile-time known bounds, it's not for arrays of dynamic size.
- for-loop counters -- according to Pascal, they become undefined after (regular) termination of the loop. GPC currently doesn't do it, and it seems hard to do with the backend.
Actually, the warning it not even always conservative. For var-parameters, we assume they can modify the variable (otherwise, there would be many spurious warnings). So the following program gives no warning.
program Uninitialized2;
procedure p (var a: Integer); begin WriteLn (a) end;
procedure q; var i: Integer;
begin p (i); WriteLn (i) end;
begin q end.
Static analysis of runtime-dependent behaviour (as basically all such cases) is usually very difficult to impossible (cf. halting problem). Runtime analysis seems to require an implicit flag for each variable or field (some type have unused bit-patterns that could be used instead, but not all do), which would have to be set and checked on each access. The considerations WRT performance and externals routines (see my parallel mail about memory checking) apply here as well.
Frank
Frank Heckenbach wrote:
Adriaan van Os wrote:
There are other holes, such as dangling pointers and using unintialized variables, which GPC cannot detect at all yet. It might do in the future, but implementing them will be very hard, so don't hold your breath.
I have always presumed that -Wuninitialized (combined with -On, where n
=1) does warn about uninitialized variables. The following simple test
seems to confirm that, but only for local variables, not for global variables (which maybe are automatically initialized to 0 ?? (a dangerous presumption by the way)).
These are static (compile-time) checks, i.e., when a variable *might* be used unintialized. Checking when a variable actually is used uninitialized can only be done at runtime. Of course, catching too much shouldn't hurt (except for some spurious warnings, that you probably have seen yourself, which can be worked-around after careful verification).
However, the static checks only work for entire local variables. In particular they don't work for:
global variables -- as you say they are automatically initialized to 0, but that's only a C heritage of the backend, and not guaranteed in Pascal, so actually there should be a warning.
heap-allocated variables -- since not even the number of allocations is generally known at compile-time, they can't usually be statically checked
variant record fields -- according to standard Pascal, all fields become undefined when changing the variant (by assignment to the selector if present, or if not present, by using some field of another variant). That's also generally only known at runtime.
record/array fields -- currently GPC warns only for the whole structure. Though it would be possible (at least in principle) to extend it to record fields, and components of arrays with compile-time known bounds, it's not for arrays of dynamic size.
for-loop counters -- according to Pascal, they become undefined after (regular) termination of the loop. GPC currently doesn't do it, and it seems hard to do with the backend.
Actually, the warning it not even always conservative. For var-parameters, we assume they can modify the variable (otherwise, there would be many spurious warnings). So the following program gives no warning.
program Uninitialized2;
procedure p (var a: Integer); begin WriteLn (a) end;
procedure q; var i: Integer;
begin p (i); WriteLn (i) end;
begin q end.
Static analysis of runtime-dependent behaviour (as basically all such cases) is usually very difficult to impossible (cf. halting problem). Runtime analysis seems to require an implicit flag for each variable or field (some type have unused bit-patterns that could be used instead, but not all do), which would have to be set and checked on each access. The considerations WRT performance and externals routines (see my parallel mail about memory checking) apply here as well.
I see, thanks for the explanaton.
Regards,
Adriaan van Os
On Thu, 26 Jan 2006, Frank Heckenbach wrote:
Adriaan van Os wrote:
Is Pascal and namely GNU Pascal safer re: buffer overruns? How much does runtime range checking help
Thank you for this link, Adriaan,
(Of course, the answer of Mr. Waldek Hebisch is depending on entire run/time system being (re)compiled with range checking on IMHO. Otherwise a runtime overrun could happen inside a system function when unchecked?)
Additionally I am interested whether there is a plan to introduce NX bit support (or emulation), and about StackGuard or similar "canary" protection.
Does NX bit use depend on compiler support at all, or is it completelly OS-supported/dependent? I reckon there ought to be something in RTS mmap()-ing stack area with PROT_EXEC disabled or am I wrong?
Yes. In particular, some holes are intentional, either for compatibility with other dialects (pointer-arithmetic etc.), or for performance reasons (possibility to turn off checks).
There are other holes, such as dangling pointers and using unintialized variables, which GPC cannot detect at all yet. It might do in the future, but implementing them will be very hard, so don't hold your breath.
Thank you for your reply, Frank,
In fact, this came to my mind in few considerations too: I was thinking of new breed of pointers that would allow for memory map defragmentation. As said somewhere in theoretical books, after millions of seemingly random allocations/deallocations - memory looks completely fragmented, and wtih equally distributed holes and used spaces. This as you must have noted means that actual system memory used is at least twice the malloc() allocated memory, since the holes cannot be defragmented or compacted.
This is why I would like to propose "registered pointers" (maybe it came subconsciously from Java?) and memory allocation library that would have a better heap use and corruption detection.
Registered pointers are close in semantics to pointers with ranges already intorduced on the list, but they should also allow defragmentation of the memory on-the-fly, compacting of free space and deallocating and returning ampped pages back to system (unlike current memory allocators that behave like my country's budget: they only grow and become less efficiently used with time ...).
Is that too difficult to implement?
(I am sorry if this has been discussed already.)
To explain the idea more visibly, as we are all probably tired at 4 PM:
Every pointer would be registered in a pointer table or list *depending on implementation*
Pointers would be indirect, used through a mechanism that would allow transparent moving of memory fragment pointed to.
A garbage collection mechanism would browse through all pointers and defragment memory, much like a file system, moving used memory towards beginning, free memory towards end of memory mapping, so it could be later munmap()-ed and returned to OS when certain LOW_WATERMARK or threshold is reached
Unused fragments of memory that are not explicitly free()-ed could be deallocated automatically and an optional warning could be generated
Safe garbage collection memory allocator could and probably should be mapped in separate address range from "normal" heap memory mapping that permits direct use and hacking of pointers
Better heap use would allow for introducing "heap canary" technique that would eliminate another source of exploits
All could be done transparent, without changing existing code
Better heap integirty checking would be introduced to existing programs, detecting memory leaks as warnings
That's the general idea, but I haven't tried implementation yet.
Compatible programs could simply be linked with new library and have the advantage of better memory use, better heap integrity and possibly longer uptime without need to reboot system because of memory leaks because of fragmentation.
I went far from original question, but this idea is tempting me for a long time, and GPC development team seems open enough :-) that I thought of requesting implementation or putting it on the wish list.
and to what extent can we depend on it?
Adriaan wrote:
The only thing you can depend on in programming is clear thinking. Don't trust anything that promises you automatic wonders.
I don't. But this would be simply a means of debugging, not an electric fense for intruders. As well as new heap mechanism described above.
Frank wrote:
Yes. Though it might help thinking clearly when you don't have to worry about every little detail (such as basically every C-string operation, which can be dangerous if not careful; note, I said C-string, so using C-strings in Pascal is just as dangerous as in C, of course).
Understood.
Is it acceptable to write setuid root programs in GPC and what are the cautions?
From http://en.wikipedia.org/wiki/Setuid
"While the setuid feature is very useful in many cases, it can however pose a security risk if the setuid attribute is assigned to executable programs that are not carefully designed. Users can exploit vulnerabilities in flawed programs to gain permanent elevated privileges, or unintentionally execute a trojan horse program."
I think "carefully designed" is the keyword (and ask yourself if that applies to the C and C++ programming languages (...)).
Indeed. But apart from the programming language, the same applies to your program as well. Even a perfectly secure programming language and environment don't help if your program, e.g., executes another program carelessly. No amount of string and buffer checking helps if the program allows the name of the program to be executed to be supplied by an attacker, say /bin/sh ...
As for the original question about buffer overruns (only), Pascal in general and GPC in particular should be quite a bit safer than C, *if* you turn on all checks and use only high-level features (this excludes, e.g., the address operator, type-casts, abuse of variant records, low-level routines such as GetMem, Move and FillChar, calls to unsafe external routines (assembler, C, or whatever), and probably much more).
This is what I wanted to know. Since system programs have security a greater priority than runtime efficiency, I would not turn the range checks off.
The "only" problem is implementing all those low-level interfaces to be used from GNU Pascal in elegant manner.
Please consider the new heap allocation system proposal above and comment on it. I ma sorry if it is already made by someone, I give my word I did not steal intentionally.
Sorry for length of this email, I am not very good with being laconic in English. :-)
Mirsad
Mirsad Todorovac wrote:
On Thu, 26 Jan 2006, Frank Heckenbach wrote:
Adriaan van Os wrote:
Is Pascal and namely GNU Pascal safer re: buffer overruns? How much does runtime range checking help
Thank you for this link, Adriaan,
(Of course, the answer of Mr. Waldek Hebisch is depending on entire
Did he reply by private mail? I don't see his mail in the archive.
run/time system being (re)compiled with range checking on IMHO. Otherwise a runtime overrun could happen inside a system function when unchecked?)
Yes. But by default, the RTS is compiled with range-checking on. (cecause this is GPC's default in general, so unless you disable checking by a make option for the RTS).
Additionally I am interested whether there is a plan to introduce NX bit support (or emulation), and about StackGuard or similar "canary" protection.
I suppose this means non-executable, for stack memory?
Does NX bit use depend on compiler support at all, or is it completelly OS-supported/dependent? I reckon there ought to be something in RTS mmap()-ing stack area with PROT_EXEC disabled or am I wrong?
The stack is allocated (intially and grow) by the OS automatically, so it's an OS issue. However, actually GPC is affected by it (more precisely, those GCC backends that implement trampolines on the stack, with those frontends (languages) that have local routines and pointers/references to routines which includes, of course, Pascal). There's actually code in several backends to disable PROT_EXEC for stack areas where needed, as a work-around.
Yes. In particular, some holes are intentional, either for compatibility with other dialects (pointer-arithmetic etc.), or for performance reasons (possibility to turn off checks).
There are other holes, such as dangling pointers and using unintialized variables, which GPC cannot detect at all yet. It might do in the future, but implementing them will be very hard, so don't hold your breath.
Thank you for your reply, Frank,
In fact, this came to my mind in few considerations too: I was thinking of new breed of pointers that would allow for memory map defragmentation. As said somewhere in theoretical books, after millions of seemingly random allocations/deallocations - memory looks completely fragmented, and wtih equally distributed holes and used spaces.
I'm no expert in this area, but I'd think this basically assumes a dumb memory manager. A MM that distributes pages into chunks of fixed size (say, powers of two, up to a certain limit) should do much better, and that's what all modern MMs do, AFAIK. Above the page size, fragementation doesn't matter much anyway, as it only fragements the virtual mapping, i.e. holes can be unmapped.
Also, I'm not sure the situation is really typical. Often a big number of allocations is deallocated in onw go (say, a list).
This is why I would like to propose "registered pointers" (maybe it came subconsciously from Java?) and memory allocation library that would have a better heap use and corruption detection.
Registered pointers are close in semantics to pointers with ranges already intorduced on the list, but they should also allow defragmentation of the memory on-the-fly, compacting of free space and deallocating and returning ampped pages back to system (unlike current memory allocators that behave like my country's budget: they only grow and become less efficiently used with time ...).
Is that too difficult to implement?
Probably yes, because it affects everything. The code generation is different, and all library routines must be recompiled this way. Since this is not realistically possible for libc and other default libraries used, this means you need a lot of wrappers. And when such libraries can do de-/allocation themselves, this will also pose some interesting problems.
And think about performance. Double indirection for every pointer access doesn't really speed up the program. ;-)
(I am sorry if this has been discussed already.)
To explain the idea more visibly, as we are all probably tired at 4 PM:
Every pointer would be registered in a pointer table or list *depending on implementation* Pointers would be indirect, used through a mechanism that would allow transparent moving of memory fragment pointed to. A garbage collection mechanism would browse through all pointers and defragment memory, much like a file system, moving used memory towards beginning, free memory towards end of memory mapping, so it could be later munmap()-ed and returned to OS when certain LOW_WATERMARK or threshold is reached Unused fragments of memory that are not explicitly free()-ed could be deallocated automatically and an optional warning could be generated
The latter is possible already, e.g. using the Boehm-Demers-Weiser conservative garbage collector, which can be plugged transparently into any libc-allocation based program (including GPC programs).
Better heap use would allow for introducing "heap canary" technique that would eliminate another source of exploits
According to http://en.wikipedia.org/wiki/Stack_frame, canaries are there to detect overflows in the first place.
It also describes a way to prevent such attacks to the return address on the stack, by detecting them before the return instruction (which may not be 100% foolproof, with longjmp etc., but rather reliable), but I don't see how this directly translates to the heap. Of course, you could check a canary before *every* access to some heap allocated variable, but that would be quite some overhead (perhaps only useful for debugging), and probably still not easy to implement -- if you have, say, a var parameter or a pointer parameter, how does the routine know if it has a canary (allocated with one), or not (allocated from foreign code, or global or local variables not on the heap).
Apart from that, allocating with a canary seems to be independent of indirect pointers or compiler internals, so you could probably do it by writing your own allocation routines. You'd only have to arrage for the checking to happen an strategic places or such ...
As for (mostly) debugging checks, there's also libefence which you probably know (which also works by overloading the de-/allocation routines without any compiler support).
All could be done transparent, without changing existing code
You mean source code? (Otherwise I don't understand what you mean by indirect pointers.) But changing object code is not easy when it necessarily includes *all* libraries used. (Indirect pointers change the interface, so you can't just recompile those libraries you want to have extra chekcing in.)
That's the general idea, but I haven't tried implementation yet.
Compatible programs could simply be linked with new library and have the advantage of better memory use, better heap integrity and possibly longer uptime without need to reboot system because of memory leaks because of fragmentation.
If that's the case (and I misunderstood your indirect pointers), go ahead and implement it. You can hook malloc/free (see libefence or the BDW collector as examples to start from). In fact, with dynamic libraries, you might not even have to relink programs, if you use LD_PRELOAD. And it would be independent of languages, as long as they use malloc/free (which GPC does internally by default).
I went far from original question, but this idea is tempting me for a long time, and GPC development team seems open enough :-) that I thought of requesting implementation or putting it on the wish list.
IMHO, I see it rather as a separate project. GPC is not really short of features (existing and wishlist), and due to the considerations above, it seems easily separable (unless you really want to add compiler-supported checking which you don't seem to want, according to the previous paragraph).
Frank
Frank Heckenbach wrote:
Mirsad Todorovac wrote:
On Thu, 26 Jan 2006, Frank Heckenbach wrote:
Adriaan van Os wrote:
Is Pascal and namely GNU Pascal safer re: buffer overruns? How much does runtime range checking help
Thank you for this link, Adriaan,
(Of course, the answer of Mr. Waldek Hebisch is depending on entire
Did he reply by private mail? I don't see his mail in the archive.
?
http://www.gnu-pascal.de/crystal/gpc/en/mail12961.html refers to an email from Waldek.
Regards,
Adriaan van Os
Adriaan van Os wrote:
Frank Heckenbach wrote:
Mirsad Todorovac wrote:
On Thu, 26 Jan 2006, Frank Heckenbach wrote:
Adriaan van Os wrote:
Is Pascal and namely GNU Pascal safer re: buffer overruns? How much does runtime range checking help
Thank you for this link, Adriaan,
(Of course, the answer of Mr. Waldek Hebisch is depending on entire
Did he reply by private mail? I don't see his mail in the archive.
?
http://www.gnu-pascal.de/crystal/gpc/en/mail12961.html refers to an email from Waldek.
Ah, OK. I had thought Mirsad was referring to an answer from Waldek in this thread.
Frank
Sorry for the delay, but I had regular obligations at my job :-(
Frank Heckenbach wrote:
Mirsad Todorovac wrote:
On Thu, 26 Jan 2006, Frank Heckenbach wrote:
Adriaan van Os wrote:
Is Pascal and namely GNU Pascal safer re: buffer overruns? How much does runtime range checking help
Thank you for this link, Adriaan,
(Of course, the answer of Mr. Waldek Hebisch is depending on entire
Did he reply by private mail? I don't see his mail in the archive.
I thought the
http://www.gnu-pascal.de/crystal/gpc/en/mail12961.html was written by him. probably I did not read well?
run/time system being (re)compiled with range checking on IMHO. Otherwise a runtime overrun could happen inside a system function when unchecked?)
Yes. But by default, the RTS is compiled with range-checking on. (cecause this is GPC's default in general, so unless you disable checking by a make option for the RTS).
Thank you for explaining that. This helps.
Additionally I am interested whether there is a plan to introduce NX bit support (or emulation), and about StackGuard or similar "canary" protection.
I suppose this means non-executable, for stack memory?
As you correctly concluded.
Does NX bit use depend on compiler support at all, or is it completelly OS-supported/dependent? I reckon there ought to be something in RTS mmap()-ing stack area with PROT_EXEC disabled or am I wrong?
The stack is allocated (intially and grow) by the OS automatically, so it's an OS issue. However, actually GPC is affected by it (more precisely, those GCC backends that implement trampolines on the stack, with those frontends (languages) that have local routines and pointers/references to routines which includes, of course, Pascal). There's actually code in several backends to disable PROT_EXEC for stack areas where needed, as a work-around.
How do they implement PROT_EXEC disable w/o hardware NX bit, I wonder? I've heard of emulations but I never understood how this works. Probably I should do my homeworks ;-)
Yes. In particular, some holes are intentional, either for compatibility with other dialects (pointer-arithmetic etc.), or for performance reasons (possibility to turn off checks).
There are other holes, such as dangling pointers and using unintialized variables, which GPC cannot detect at all yet. It might do in the future, but implementing them will be very hard, so don't hold your breath.
Thank you for your reply, Frank,
In fact, this came to my mind in few considerations too: I was thinking of new breed of pointers that would allow for memory map defragmentation. As said somewhere in theoretical books, after millions of seemingly random allocations/deallocations - memory looks completely fragmented, and wtih equally distributed holes and used spaces.
I'm no expert in this area, but I'd think this basically assumes a dumb memory manager. A MM that distributes pages into chunks of fixed size (say, powers of two, up to a certain limit) should do much better, and that's what all modern MMs do, AFAIK. Above the page size, fragementation doesn't matter much anyway, as it only fragements the virtual mapping, i.e. holes can be unmapped.
Also, I'm not sure the situation is really typical. Often a big number of allocations is deallocated in onw go (say, a list).
True, all you said stands. Yet, if a program should run really long and use heap (and that's what is recommended, since heap overruns are meant to be less dangerous for security, even though recent exploits use explicitly heap overruns on Windows AFAIK). I've seen an article that explains heap overrun exploit in detail, and how it was made impossible by heap randomizations in Windows XP SP2. But it is slightly off-topic, if we do not decide to introduce a better heap allocator option to GPC RTS.
DO you know of an example of memory mapping holes being deallocated? In fact, statistically most of the holes will be less than page a size, wouldn't they? So, IMHO they cannot be deallocated w/o some kind of reorganization of the heap memory map, and this again is not possible as we currently have no means (unlike Java I suppose) of knowing about all the pointers that use particular part of heap and which need to be adjusted to reflect change in memory map.
This requires indirect pointers. Then all pointers to certain memory area could share a base pointer, and IMHO it is not a big complication nor speed delay compared to enhancements we get!
For example, all base pointers could fit on few memory pages that would easily fit in processor caches, and they would since often used. So, this form of indirection does not appear to be costly in terms of raw performance. (Frankly, range checking is costly also, isn't it?)
The cost of modifying compiler and all wrappers is something different, I agree.
The idea is still being conceived, so I'd probably better do more homework and study Boehm garbage collector and similar works.
But the idea of memory compaction and de-fragmentation still seems very good to me, even thought it looks like SF now. Probably the best would be to try to implement a library instead change to compiler or RTS as a first step, right?
I understand the power-of-two heaps idea, but it still seems to me that there is a vast space for defragmentation.
This is why I would like to propose "registered pointers" (maybe it came subconsciously from Java?) and memory allocation library that would have a better heap use and corruption detection.
Registered pointers are close in semantics to pointers with ranges already intorduced on the list, but they should also allow defragmentation of the memory on-the-fly, compacting of free space and deallocating and returning ampped pages back to system (unlike current memory allocators that behave like my country's budget: they only grow and become less efficiently used with time ...).
Is that too difficult to implement?
Probably yes, because it affects everything. The code generation is different, and all library routines must be recompiled this way. Since this is not realistically possible for libc and other default libraries used, this means you need a lot of wrappers. And when such libraries can do de-/allocation themselves, this will also pose some interesting problems.
I see this is a huge problem. This is why perhaps a library would be a better first step to solve the problem. Essentially, this means two heaps, but I don't think that is a huge problem, especially now that we are looking into bright 64-bit address space future :-)
And think about performance. Double indirection for every pointer access doesn't really speed up the program. ;-)
Some of these issues have been discussed here ( http://www.javaworld.com/javaworld/jw-08-1996/jw-08-gc-p3.html ):
* Compacting collectors * Garbage collectors of JVMs will likely have a strategy to combat heap fragmentation. Two strategies commonly used by mark and sweep collectors are /compacting/ and /copying/. Both of these approaches move objects on the fly to reduce heap fragmentation. Compacting collectors slide live objects over free memory space toward one end of the heap. In the process the other end of the heap becomes one large contiguous free area. All references to the moved objects are updated to refer to the new location.
Updating references to moved objects is sometimes made simpler by adding a level of indirection to object references. Instead of referring directly to objects on the heap, object references refer to a table of object handles. The object handles refer to the actual objects on the heap. When an object is moved, only the object handle must be updated with the new location. All references to the object in the executing program will still refer to the updated handle, which did not move. While this approach simplifies the job of heap defragmentation, it adds a performance overhead to every object access.
* Copying collectors * Copying garbage collectors move all live objects to a new area. As the objects are moved to the new area, they are placed side by side, thus eliminating any free spaces that may have separated them in the old area. The old area is then known to be all free space. The advantage of this approach is that objects can be copied as they are discovered by the traversal from the root nodes. There are no separate mark and sweep phases. Objects are copied to the new area on the fly, and forwarding pointers are left in their old locations. The forwarding pointers allow objects encountered later in the traversal that refer to already copied objects to know the new location of the copied objects.
A common copying collector is called /stop and copy/. In this scheme, the heap is divided into two regions. Only one of the two regions is used at any time. Objects are allocated from one of the regions until all the space in that region has been exhausted. At that point program execution is stopped and the heap is traversed. Live objects are copied to the other region as they are encountered by the traversal. When the stop and copy procedure is finished, program execution resumes. Memory will be allocated from the new heap region until it too runs out of space. At that point the program will once again be stopped. The heap will be traversed and live objects will be copied back to the original region. The cost associated with this approach is that twice as much memory is needed for a given amount of heap space because only half of the available memory is used at any time.
(Sorry if this is too long of a paste)
What I had in mind in the beginning is essentially the third strategy: "registered pointers". The new type of pointers would have to be registered, and heap manager would have to update all registered pointers to reflect the heap area copy-and-move.
As it would be done on-the-fly, and all pointers would point to the same byte of data after move is made, the defragmentation would be transparent to any program.
(I am sorry if this has been discussed already.)
To explain the idea more visibly, as we are all probably tired at 4 PM:
Every pointer would be registered in a pointer table or list *depending on implementation* Pointers would be indirect, used through a mechanism that would allow transparent moving of memory fragment pointed to. A garbage collection mechanism would browse through all pointers and defragment memory, much like a file system, moving used memory towards beginning, free memory towards end of memory mapping, so it could be later munmap()-ed and returned to OS when certain LOW_WATERMARK or threshold is reached Unused fragments of memory that are not explicitly free()-ed could be deallocated automatically and an optional warning could be generated
The latter is possible already, e.g. using the Boehm-Demers-Weiser conservative garbage collector, which can be plugged transparently into any libc-allocation based program (including GPC programs).
OK, thanks for pointing that. I am trying to study the Boehm papers. Perhaps all I proposed has already been done :-(
Better heap use would allow for introducing "heap canary" technique that would eliminate another source of exploits
According to http://en.wikipedia.org/wiki/Stack_frame, canaries are there to detect overflows in the first place.
It also describes a way to prevent such attacks to the return address on the stack, by detecting them before the return instruction (which may not be 100% foolproof, with longjmp etc., but rather reliable), but I don't see how this directly translates to the heap. Of course, you could check a canary before *every* access to some heap allocated variable, but that would be quite some overhead (perhaps only useful for debugging), and probably still not easy to implement -- if you have, say, a var parameter or a pointer parameter, how does the routine know if it has a canary (allocated with one), or not (allocated from foreign code, or global or local variables not on the heap).
IMHO, canary ought to be checked on free(). This would catch a number of errors, since most common error alongside buffer overrun is probably of-by-one error in loop.
Apart from that, allocating with a canary seems to be independent of indirect pointers or compiler internals, so you could probably do it by writing your own allocation routines. You'd only have to arrage for the checking to happen an strategic places or such ...
As for (mostly) debugging checks, there's also libefence which you probably know (which also works by overloading the de-/allocation routines without any compiler support).
Of course I know of it, but I haven't tried it yet.
All could be done transparent, without changing existing code
You mean source code? (Otherwise I don't understand what you mean by indirect pointers.) But changing object code is not easy when it necessarily includes *all* libraries used. (Indirect pointers change the interface, so you can't just recompile those libraries you want to have extra chekcing in.)
Obviously, the source code. I haven't been very clear with the obvious fact that changing pointer semantics and/or size into indirect pointer or pointer with upper and lower boundary would inevitably imply at least recompilation of libraries, assuming that indirection would be implemented transparent to existing source. And having in mind that GPC uses libc extensively, this may not be feasible in terms of necessary wrappers.
That's the general idea, but I haven't tried implementation yet.
Compatible programs could simply be linked with new library and have the advantage of better memory use, better heap integrity and possibly longer uptime without need to reboot system because of memory leaks because of fragmentation.
If that's the case (and I misunderstood your indirect pointers), go ahead and implement it. You can hook malloc/free (see libefence or the BDW collector as examples to start from). In fact, with dynamic libraries, you might not even have to relink programs, if you use LD_PRELOAD. And it would be independent of languages, as long as they use malloc/free (which GPC does internally by default).
There's a proverb here in Croatia about "inventing hot water". To avoid this, I should better do a homework on what Mr. Hans Boehm and other garbage collector writers have done.
I went far from original question, but this idea is tempting me for a long time, and GPC development team seems open enough :-) that I thought of requesting implementation or putting it on the wish list.
IMHO, I see it rather as a separate project. GPC is not really short of features (existing and wishlist), and due to the considerations above, it seems easily separable (unless you really want to add compiler-supported checking which you don't seem to want, according to the previous paragraph).
The problem is: why all those fancy mechanisms of heap protection as libefence are not used more widely? Simply because they are not handy, and few people know of them. Having it seamlessly distributed with compiler or as a language option might make people consider using it.
The basic inspiration came from my studying of month-and.-half long virus invasion and recent network and system intrusions I faced helplessly in last (sort of) six months as the system administrator.
If the buffer overrun protection isn't elegant, seamless and nearly mandatory, programmers might not use it I'm afraid.
Can we do something in this direction?
Mirsad
PS. I ope this doesn't get in HTML on the list, since this is first time I use Thunderbird.
Mirsad Todorovac wrote:
Additionally I am interested whether there is a plan to introduce NX bit support (or emulation), and about StackGuard or similar "canary" protection.
I suppose this means non-executable, for stack memory?
As you correctly concluded.
Does NX bit use depend on compiler support at all, or is it completelly OS-supported/dependent? I reckon there ought to be something in RTS mmap()-ing stack area with PROT_EXEC disabled or am I wrong?
The stack is allocated (intially and grow) by the OS automatically, so it's an OS issue. However, actually GPC is affected by it (more precisely, those GCC backends that implement trampolines on the stack, with those frontends (languages) that have local routines and pointers/references to routines which includes, of course, Pascal). There's actually code in several backends to disable PROT_EXEC for stack areas where needed, as a work-around.
How do they implement PROT_EXEC disable w/o hardware NX bit, I wonder? I've heard of emulations but I never understood how this works. Probably I should do my homeworks ;-)
Why without? AFAIK, PROT_EXEC is (roughly speaking) the software side of hardware NX.
BTW, according to http://en.wikipedia.org/wiki/NX_Bit:
: Although this sort of mechanism has been around for years in various : other processor architectures such as Sun's SPARC, Alpha, IBM's : PowerPC, and even Intel's IA-64 architecture (also known as Itanium : or Merced processor), the term is actually a name created by AMD for : use by its AMD64 line of processors, such as the Athlon 64 and : Opteron. It seems to have now become a common term used to : generically describe similar technologies in other processors. : Intel's x86 processors included this feature since 80286 processor, : but that memory model is treated as obsolete by modern processors : and operating systems. De facto it could not be used by modern : programs, and AMD re-implemented the feature for the Flat memory : model used now.
So if you thought this was a brand-new hardware feature, it really isn't (apart from the name). Just Intel had screwed up for too long ...
I've seen an article that explains heap overrun exploit in detail, and how it was made impossible by heap randomizations in Windows XP SP2. But it is slightly off-topic, if we do not decide to introduce a better heap allocator option to GPC RTS.
Randomization might help to spoil particular attacks, or make them less likely to succeed, but cannot provide perfect protection. BTW, this should also be possible with a plug-in MM replacement.
DO you know of an example of memory mapping holes being deallocated? In fact, statistically most of the holes will be less than page a size, wouldn't they?
I don't really have statistics. I suppose when deallocating a list, some pages would be freed completely and be available either for returning to the OS, or at least for reuse within the same process (possibly with other chunk sizes, as needed). I suppose typical MMs do at least the latter.
This requires indirect pointers. Then all pointers to certain memory area could share a base pointer, and IMHO it is not a big complication nor speed delay compared to enhancements we get!
It is! Any solution that changes that ABI or something like that, and thus at least requires recompilation of all libraries is a big deal in practical terms (and IMHO should not even attempted in a single language on its own, unless that language operates in an isolated world, sandbox or whatever).
For example, all base pointers could fit on few memory pages that would easily fit in processor caches, and they would since often used. So, this form of indirection does not appear to be costly in terms of raw performance.
I'm skeptical. First, the cache used is not available for other purposes. Second, it takes additional statements which take time to execute and consume cache for the code. And since the change is pervasive, you don't even have the option to disable it in tight inner loops; you can only hope the optimizer is smart enough to move as many indirections as possible outside of the loop.
(Frankly, range checking is costly also, isn't it?)
Yes, but you can avoid it, e.g. by typing your counters and index variables with appropriate subranges, thus moving the necessary checks outside of critical inner loops (without needing any compiler options to turn them off temporarily, and not relying on the optimizer, but guaranteed by standard language features). Chuck Falconer has written about this more than once on this list.
But the idea of memory compaction and de-fragmentation still seems very good to me, even thought it looks like SF now. Probably the best would be to try to implement a library instead change to compiler or RTS as a first step, right?
Yes, I think so.
I understand the power-of-two heaps idea, but it still seems to me that there is a vast space for defragmentation.
Power of two is probably not the final word on this issue. But you might want to run some actual tests with real-world programs and current memory managers (say, that of glibc) to get significant numbers.
Some of these issues have been discussed here ( http://www.javaworld.com/javaworld/jw-08-1996/jw-08-gc-p3.html ):
[...]
(Sorry if this is too long of a paste)
GC is really a science of its own (literally), and IMHO it's a bit off-topic here.
What I had in mind in the beginning is essentially the third strategy: "registered pointers". The new type of pointers would have to be registered, and heap manager would have to update all registered pointers to reflect the heap area copy-and-move. As it would be done on-the-fly, and all pointers would point to the same byte of data after move is made, the defragmentation would be transparent to any program.
But it's also pervasive, i.e. affect all libraries (since they may copy pointers). And it would have to take care of pointers on the stack, in registers, etc.
OK, thanks for pointing that. I am trying to study the Boehm papers. Perhaps all I proposed has already been done :-(
It's surely been discussed at length, and there are experts in this area, which both of us are not, and this list is not really the place to discuss it ...
Better heap use would allow for introducing "heap canary" technique that would eliminate another source of exploits
According to http://en.wikipedia.org/wiki/Stack_frame, canaries are there to detect overflows in the first place.
It also describes a way to prevent such attacks to the return address on the stack, by detecting them before the return instruction (which may not be 100% foolproof, with longjmp etc., but rather reliable), but I don't see how this directly translates to the heap. Of course, you could check a canary before *every* access to some heap allocated variable, but that would be quite some overhead (perhaps only useful for debugging), and probably still not easy to implement -- if you have, say, a var parameter or a pointer parameter, how does the routine know if it has a canary (allocated with one), or not (allocated from foreign code, or global or local variables not on the heap).
IMHO, canary ought to be checked on free(). This would catch a number of errors, since most common error alongside buffer overrun is probably of-by-one error in loop.
This would serve as a debugging aid (similar to efence), not as attack prevention, as an attack can occur before free().
Obviously, the source code. I haven't been very clear with the obvious fact that changing pointer semantics and/or size into indirect pointer or pointer with upper and lower boundary would inevitably imply at least recompilation of libraries, assuming that indirection would be implemented transparent to existing source. And having in mind that GPC uses libc extensively, this may not be feasible in terms of necessary wrappers.
Actually it means that the only possibly successful way is probabl doing it all in the backend, so it would work the same for all languages. Perhaps you can get such an option in the backend. Then you basically only have to compile two versions of your whole system ... ;-)
IMHO, I see it rather as a separate project. GPC is not really short of features (existing and wishlist), and due to the considerations above, it seems easily separable (unless you really want to add compiler-supported checking which you don't seem to want, according to the previous paragraph).
The problem is: why all those fancy mechanisms of heap protection as libefence are not used more widely? Simply because they are not handy, and few people know of them. Having it seamlessly distributed with compiler or as a language option might make people consider using it.
IMHO it would be considerably less work to create and distribute an integrated environment containing the existing tools and making them easily available than writing something entirely new just for this reason (i.e., unless it has other advantages).
The basic inspiration came from my studying of month-and.-half long virus invasion and recent network and system intrusions I faced helplessly in last (sort of) six months as the system administrator.
If the buffer overrun protection isn't elegant, seamless and nearly mandatory, programmers might not use it I'm afraid.
Can we do something in this direction?
I'm afraid we probably cannot solve The Computer Security Problem within the next two weeks. ;-)
And, BTW, this is also an area of its own, with its experts. This does not mean we should not care here, but one should really first study existing work and state of the art. If implementation of some techniques require compiler support, we can discuss them here, but this is not really the place to design new techniques (which most likely will have been discussed by the experts already).
PS. I ope this doesn't get in HTML on the list, since this is first time I use Thunderbird.
No, it didn't.
Frank
On Sat, 28 Jan 2006, Frank Heckenbach wrote:
How do they implement PROT_EXEC disable w/o hardware NX bit, I wonder? I've heard of emulations but I never understood how this works. Probably I should do my homeworks ;-)
Why without? AFAIK, PROT_EXEC is (roughly speaking) the software side of hardware NX.
Could be on some platforms. AFAIR, there was a platform (Linux?, it's so dim in my memory) which did not implement PROT_EXEC protection. AFAIK, Linux kernels generally did not until 2.6 versions. But you must have knwon this, and this is a backend issue I suppose.
BTW, according to http://en.wikipedia.org/wiki/NX_Bit:
: Although this sort of mechanism has been around for years in various : other processor architectures such as Sun's SPARC, Alpha, IBM's : PowerPC, and even Intel's IA-64 architecture (also known as Itanium : or Merced processor), the term is actually a name created by AMD for : use by its AMD64 line of processors, such as the Athlon 64 and : Opteron. It seems to have now become a common term used to : generically describe similar technologies in other processors. : Intel's x86 processors included this feature since 80286 processor, : but that memory model is treated as obsolete by modern processors : and operating systems. De facto it could not be used by modern : programs, and AMD re-implemented the feature for the Flat memory : model used now.
So if you thought this was a brand-new hardware feature, it really isn't (apart from the name). Just Intel had screwed up for too long ...
Agreed. Some of the things Intel gets away with are beyond my comprehension. Thank Gawd for AMD :-) who brought NX bit back ...
I've seen an article that explains heap overrun exploit in detail, and how it was made impossible by heap randomizations in Windows XP SP2. But it is slightly off-topic, if we do not decide to introduce a better heap allocator option to GPC RTS.
Randomization might help to spoil particular attacks, or make them less likely to succeed, but cannot provide perfect protection. BTW, this should also be possible with a plug-in MM replacement.
I trust your experience and expertise on that.
DO you know of an example of memory mapping holes being deallocated? In fact, statistically most of the holes will be less than page a size, wouldn't they?
I don't really have statistics. I suppose when deallocating a list, some pages would be freed completely and be available either for returning to the OS, or at least for reuse within the same process (possibly with other chunk sizes, as needed). I suppose typical MMs do at least the latter.
Agreed. OTOH, if we run for example a database, insertions/deletions to/from heap may be seemingly random. I trust a lecture I've heard from my college Professor, and I think he had some references with extensive simulations at least. IMHO general purpose allocator should not rely on "burst" allocations/deallocations, but a more stochastic memory manager use.
I suppose GPC uses default libs's malloc/calloc/free, and AFAIK GPC libc malloc team is also thinking of improvement of default malloc in libc, as this would give immediate improvement even to already linked programs if ABI compatibility is preserved.
This requires indirect pointers. Then all pointers to certain memory area could share a base pointer, and IMHO it is not a big complication nor speed delay compared to enhancements we get!
It is! Any solution that changes that ABI or something like that, and thus at least requires recompilation of all libraries is a big deal in practical terms (and IMHO should not even attempted in a single language on its own, unless that language operates in an isolated world, sandbox or whatever).
OK, I see the point. I will have to do some serious study on this. It won't hurt to learn more about GPC internals and how it uses libc.
For example, all base pointers could fit on few memory pages that would easily fit in processor caches, and they would since often used. So, this form of indirection does not appear to be costly in terms of raw performance.
I'm skeptical. First, the cache used is not available for other purposes. Second, it takes additional statements which take time to execute and consume cache for the code. And since the change is pervasive, you don't even have the option to disable it in tight inner loops; you can only hope the optimizer is smart enough to move as many indirections as possible outside of the loop.
True.
(Frankly, range checking is costly also, isn't it?)
Yes, but you can avoid it, e.g. by typing your counters and index variables with appropriate subranges, thus moving the necessary checks outside of critical inner loops (without needing any compiler options to turn them off temporarily, and not relying on the optimizer, but guaranteed by standard language features). Chuck Falconer has written about this more than once on this list.
I see the difference.
But the idea of memory compaction and de-fragmentation still seems very good to me, even thought it looks like SF now. Probably the best would be to try to implement a library instead change to compiler or RTS as a first step, right?
Yes, I think so.
So we agree on something.
I understand the power-of-two heaps idea, but it still seems to me that there is a vast space for defragmentation.
Power of two is probably not the final word on this issue. But you might want to run some actual tests with real-world programs and current memory managers (say, that of glibc) to get significant numbers.
I am putting it on my TO-DO list. It is a very interesting issue in general, for all operating systems that use paging virtual memory. (OTOH, comming back to releasing unused holes, unused pages will probably be swapped-out and not reloaded again since not used - the problem is stochastic alloc/dealloc of relativelly small fragments of memory. For example, if the average size of records is from 512 to 1023 bytes, this will use a lot on 2^10 heap, but after allocations/deallocations go into asymptotic stable state there will be normal distribution of used and unused areas on each memory physical page. This means that the allocated physical pages of heap will eventually double the program's memory needs. I may seek for literature, right now I speak by memory of those lectures about Unix processes.)
Some of these issues have been discussed here ( http://www.javaworld.com/javaworld/jw-08-1996/jw-08-gc-p3.html ):
[...]
(Sorry if this is too long of a paste)
GC is really a science of its own (literally), and IMHO it's a bit off-topic here.
OK. I will try to stay on focus.
What I had in mind in the beginning is essentially the third strategy: "registered pointers". The new type of pointers would have to be registered, and heap manager would have to update all registered pointers to reflect the heap area copy-and-move. As it would be done on-the-fly, and all pointers would point to the same byte of data after move is made, the defragmentation would be transparent to any program.
But it's also pervasive, i.e. affect all libraries (since they may copy pointers). And it would have to take care of pointers on the stack, in registers, etc.
I see. I realize adding security measures drastically impacts performance (such as making all pointers "volatile" variables which cannot go to registers), but having an important system brought down on it's knees by undetected buffer overrun in an application will hurt me more both as a system administrator and as a software developer than the 20% decrease in program speed. IMHO.
OK, thanks for pointing that. I am trying to study the Boehm papers. Perhaps all I proposed has already been done :-(
It's surely been discussed at length, and there are experts in this area, which both of us are not, and this list is not really the place to discuss it ...
I have understood your argument. I am trying to stick to those issues that hold on to greater security of Pascal programs, as it can be enforced by language.
IMHO, canary ought to be checked on free(). This would catch a number of errors, since most common error alongside buffer overrun is probably of-by-one error in loop.
This would serve as a debugging aid (similar to efence), not as attack prevention, as an attack can occur before free().
True. However, several attack scenarios rely on smashing alloc list pointers and overwritting arbitrary location in memory. A canary could prevent that, if checked prior to evaluating pointers that follow it. :-)
Yet, again if GPC depends on libc malloc internally, then I guess it is a libac issue, not GPC issue.
Obviously, the source code. I haven't been very clear with the obvious fact that changing pointer semantics and/or size into indirect pointer or pointer with upper and lower boundary would inevitably imply at least recompilation of libraries, assuming that indirection would be implemented transparent to existing source. And having in mind that GPC uses libc extensively, this may not be feasible in terms of necessary wrappers.
Actually it means that the only possibly successful way is probabl doing it all in the backend, so it would work the same for all languages. Perhaps you can get such an option in the backend. Then you basically only have to compile two versions of your whole system ... ;-)
Right said: StackGuard does exactly that! I could try to apply StackGuard patch once GPC would have accepted 4.* backends.
IMHO, I see it rather as a separate project. GPC is not really short of features (existing and wishlist), and due to the considerations above, it seems easily separable (unless you really want to add compiler-supported checking which you don't seem to want, according to the previous paragraph).
The problem is: why all those fancy mechanisms of heap protection as libefence are not used more widely? Simply because they are not handy, and few people know of them. Having it seamlessly distributed with compiler or as a language option might make people consider using it.
IMHO it would be considerably less work to create and distribute an integrated environment containing the existing tools and making them easily available than writing something entirely new just for this reason (i.e., unless it has other advantages).
I could be much less work, agreed. I did not insist on writing everything from scratch myself ;-) no matter how much I enjoy mucking with internals. I realize however the problem of code maturity, which my new code would inevitably be lacking.
The basic inspiration came from my studying of month-and.-half long virus invasion and recent network and system intrusions I faced helplessly in last (sort of) six months as the system administrator.
If the buffer overrun protection isn't elegant, seamless and nearly mandatory, programmers might not use it I'm afraid.
Can we do something in this direction?
I'm afraid we probably cannot solve The Computer Security Problem within the next two weeks. ;-)
;-)
And, BTW, this is also an area of its own, with its experts. This does not mean we should not care here, but one should really first study existing work and state of the art. If implementation of some techniques require compiler support, we can discuss them here, but this is not really the place to design new techniques (which most likely will have been discussed by the experts already).
I guess you want to tell me I need to do more homework before raising similar issues, so I will try to do it next time. However, it is hard to become expert in a month, so I was relying on your experience ;-) May I be forgiven.
Thank you for your time, and I will now try to do some research.
Mirsad
Mirsad Todorovac wrote:
I see. I realize adding security measures drastically impacts performance (such as making all pointers "volatile" variables which cannot go to registers), but having an important system brought down on it's knees by undetected buffer overrun in an application will hurt me more both as a system administrator and as a software developer than the 20% decrease in program speed. IMHO.
Back to reality. You are not obliged to build buffer overruns into your software, are you ?
Right said: StackGuard does exactly that! I could try to apply StackGuard patch once GPC would have accepted 4.* backends.
You can do that right now already, see http://www.gnu-pascal.de/crystal/gpc/en/mail12959.html.
Regards,
Adriaan van Os
On Sun, 29 Jan 2006, Adriaan van Os wrote:
Mirsad Todorovac wrote:
I see. I realize adding security measures drastically impacts performance (such as making all pointers "volatile" variables which cannot go to registers), but having an important system brought down on it's knees by undetected buffer overrun in an application will hurt me more both as a system administrator and as a software developer than the 20% decrease in program speed. IMHO.
Back to reality. You are not obliged to build buffer overruns into your software, are you ?
Reality is that I have done so, looking back at the times 10 years ago when I was writing CGI programs. Effectivelly, I have opened access to shell to an intruder. The fact that it did not happen is pure luck.
(But so did Mr. Brian Kennighan, right? With introducing gets() that cannot be protected from overruns he was laying the system's security on wrong assumtion. And I deeply respect him and do not think he was stupid or incompetent. Simply the users were different at that time, and they were behaving.)
I was laying my code on wrong assumptions on safety, extensivelly using strcpy(), memcpy() and similar functions that leak the water. I assume that novice programmers would do the same (of course, today CGI is mostly replaced by PHP and perl "preempted" C for the purpose, but then again leaving the possibility to write CGI programs that leak is a bad security policy. Disabling them completely is something I do not prefer since we are University and that would cripple student's opportunity ot learn. Even learn on errors). Then, having instelled a CGI program that has security hole without knowing that, I could rely only on obscurity of code to keep the attackers away.
Looking at Microsoft policy of obscurity that simply does not work and we are brought down to our knees with every new kind of virus I came to think something ought to be done on system level. Essentially, using Perl or Java would eliminate 95% of buffer overruns implicite.
Of all this talk I will summarize with the fact that programs nowadays are written by a larger community than 30 or 40 years ago languages like Pascal have been designed. And even experienced programmers are pushed with deadlines to implement more and more features on the fly and without propper planning and testing.
So, Mr. van Os, I would like Pascal to prevent me from shooting my own leg in some very stupid mistakes.
Right said: StackGuard does exactly that! I could try to apply StackGuard patch once GPC would have accepted 4.* backends.
You can do that right now already, see http://www.gnu-pascal.de/crystal/gpc/en/mail12959.html.
Thank you, I have read the message and I think that is what I need. Part of solution.
I will have to do some research before I reply to Mr. Heckenbach, since I relied on lecture from 15 years ago (about heap fragmentation). (And hope my network will not be brought down by a new exploit.)
Best regards, Mirsad
Mirsad Todorovac wrote:
Adriaan van Os wrote:
Mirsad Todorovac wrote:
I see. I realize adding security measures drastically impacts performance (such as making all pointers "volatile" variables which cannot go to registers), but having an important system brought down on it's knees by undetected buffer overrun in an application will hurt me more both as a system administrator and as a software developer than the 20% decrease in program speed. IMHO.
Back to reality. You are not obliged to build buffer overruns into your software, are you ?
Reality is that I have done so, looking back at the times 10 years ago when I was writing CGI programs. Effectivelly, I have opened access to shell to an intruder. The fact that it did not happen is pure luck.
(But so did Mr. Brian Kennighan, right? With introducing gets() that cannot be protected from overruns he was laying the system's security on wrong assumtion. And I deeply respect him and do not think he was stupid or incompetent. Simply the users were different at that time, and they were behaving.)
I was laying my code on wrong assumptions on safety, extensivelly using strcpy(), memcpy() and similar functions that leak the water. I assume that novice programmers would do the same (of course, today CGI is mostly replaced by PHP and perl "preempted" C for the purpose, but then again leaving the possibility to write CGI programs that leak is a bad security policy. Disabling them completely is something I do not prefer since we are University and that would cripple student's opportunity ot learn. Even learn on errors). Then, having instelled a CGI program that has security hole without knowing that, I could rely only on obscurity of code to keep the attackers away.
Looking at Microsoft policy of obscurity that simply does not work and we are brought down to our knees with every new kind of virus I came to think something ought to be done on system level. Essentially, using Perl or Java would eliminate 95% of buffer overruns implicite.
Of all this talk I will summarize with the fact that programs nowadays are written by a larger community than 30 or 40 years ago languages like Pascal have been designed. And even experienced programmers are pushed with deadlines to implement more and more features on the fly and without propper planning and testing.
So, Mr. van Os, I would like Pascal to prevent me from shooting my own leg in some very stupid mistakes.
The result will be that you get careless with your gun, because of its built-in safety precautions. And then one day you will accidentally shoot yourself through the head (which is worse), because someone forgot to implement a tiny detail in the gun's safety module ....
Regards,
Adriaan van Os
Mirsad Todorovac wrote:
(But so did Mr. Brian Kennighan, right? With introducing gets() that cannot be protected from overruns he was laying the system's security on wrong assumtion. And I deeply respect him and do not think he was stupid or incompetent. Simply the users were different at that time, and they were behaving.)
I was laying my code on wrong assumptions on safety, extensivelly using strcpy(), memcpy() and similar functions that leak the water. I assume that novice programmers would do the same
Which emphasizes that C is not a good language for novice programmers. Even more so in potentially security criticial settings ...
These kinds of things are really not so easy to get wrong by accident in Pascal (even though GPC doesn't provide 100% security automatically).
(of course, today CGI is mostly replaced by PHP and perl "preempted" C for the purpose, but then again leaving the possibility to write CGI programs that leak is a bad security policy. Disabling them completely is something I do not prefer since we are University and that would cripple student's opportunity ot learn. Even learn on errors). Then, having instelled a CGI program that has security hole without knowing that, I could rely only on obscurity of code to keep the attackers away.
CGI programs written by any ordinary user can certainly a problem. What you can do is at least make sure they are executed with the user's id, not the web server's, so a mistake would at first only compromise the account of the user who made the mistake. Then you only have to worry about local-user root attacks ...
More strictly, you could run user's CGI programs in a sandboxed environment as I said. E.g., on our GPC web server, some contributors have SSH accounts. They run in a UML (User Mode Linux) environment, so even if they had their own CGI programs (which they actually don't (yet?) in this case), and those were attackable, and the attacker then gained root privileged by another, local-user exploit, they'd only be root in the UML, which again runs under an unprivileged user ID on the real system. So they'd need another exploit against UML to get really elevated privileges. And since UML is made for exactly this purpose, I'd think it's audited for security by some experts and problems getting fixed rather soon.
I think "security in depth" in the buzzword here. Make an attacker require several independent exploits in order to achieve anything.
Looking at Microsoft policy of obscurity that simply does not work and we are brought down to our knees with every new kind of virus I came to think something ought to be done on system level. Essentially, using Perl or Java would eliminate 95% of buffer overruns implicite.
And probably replace them with other kinds of failures.
And, as Adriaan said, the remaining 5% are probably the most dangerous ones.
Frank
Mirsad Todorovac wrote:
Why without? AFAIK, PROT_EXEC is (roughly speaking) the software side of hardware NX.
Could be on some platforms. AFAIR, there was a platform (Linux?, it's so dim in my memory) which did not implement PROT_EXEC protection. AFAIK, Linux kernels generally did not until 2.6 versions.
According to Wikipedia, since 2.6.8.
But you must have knwon this, and this is a backend issue I suppose.
Yes, it's an OS and backend issue.
I am putting it on my TO-DO list. It is a very interesting issue in general, for all operating systems that use paging virtual memory. (OTOH, comming back to releasing unused holes, unused pages will probably be swapped-out and not reloaded again since not used - the problem is stochastic alloc/dealloc of relativelly small fragments of memory. For example, if the average size of records is from 512 to 1023 bytes, this will use a lot on 2^10 heap, but after allocations/deallocations go into asymptotic stable state there will be normal distribution of used and unused areas on each memory physical page. This means that the allocated physical pages of heap will eventually double the program's memory needs. I may seek for literature, right now I speak by memory of those lectures about Unix processes.)
I'm really no expert here, but what assumptions are these simulations based on? Since deallocated fragments are reused, the maximum allocated space for a given chunk size is the maximum number of active allocations at any time (rounded up to the page size). Why should they be twice the program's memory needs?
Perhaps this model somehow assumes random selection of newly allocated fragments? Otherwise it seems obvious to me that the distribution is not the same for all pages, i.e. the lower pages (first tried for reuse) should generally be more densely populated since deallocated fragements on them are reused very soon.
I see. I realize adding security measures drastically impacts performance (such as making all pointers "volatile" variables which cannot go to registers),
I'm not sure this is necessary, this seems too drastic.
but having an important system brought down on it's knees by undetected buffer overrun in an application will hurt me more both as a system administrator and as a software developer than the 20% decrease in program speed. IMHO.
OTOH, I'm not sure the less drastic measures will cost only 20%. As registers are basically "L0 cache", ignoring them for all pointers might suffer an enormous penalty for pointer-intensive code. But again, we're talking about hot air, without any real data.
I understand your concerns as an admin, but really I don't think such drastic measures are the solution in the long run. They are no real replacement for proper engineering techniques.
But if you're ready to go so far, you might considering running the whole thing in a sandbox or virtual machine with very limited privileges. (Of course, this shifts the security problem to the sandbox/VM code, but that's one program instead of N programs.)
IMHO, canary ought to be checked on free(). This would catch a number of errors, since most common error alongside buffer overrun is probably of-by-one error in loop.
This would serve as a debugging aid (similar to efence), not as attack prevention, as an attack can occur before free().
True. However, several attack scenarios rely on smashing alloc list pointers and overwritting arbitrary location in memory. A canary could prevent that, if checked prior to evaluating pointers that follow it. :-)
So you're back to checking every access, not only free() ...
And, BTW, this is also an area of its own, with its experts. This does not mean we should not care here, but one should really first study existing work and state of the art. If implementation of some techniques require compiler support, we can discuss them here, but this is not really the place to design new techniques (which most likely will have been discussed by the experts already).
I guess you want to tell me I need to do more homework before raising similar issues, so I will try to do it next time. However, it is hard to become expert in a month, so I was relying on your experience ;-)
And also that there are probably not too many real security experts here. I know a bit about security, but developing new concepts (as you seem to plan) might better be done with the experts.
Adriaan van Os wrote:
Mirsad Todorovac wrote:
I see. I realize adding security measures drastically impacts performance (such as making all pointers "volatile" variables which cannot go to registers), but having an important system brought down on it's knees by undetected buffer overrun in an application will hurt me more both as a system administrator and as a software developer than the 20% decrease in program speed. IMHO.
Back to reality. You are not obliged to build buffer overruns into your software, are you ?
;-)
Of course, programmers do make mistakes.
I don't know which particular overrun this was, but chances are it would have been prevented by range checking (at least many of the common overruns would have been). IOW, as long as many programmers don't even use rather simple available techniques (either by chosing environments that don't offer them, or by stepping around them intentionally), any more invasive measures will probably have little effect either. This doesn't mean one shouldn't consider them, but for practical purposes, in the short to medium term, they will likely not make any noticeable difference. Which again means, if your main concern is the security of an actually, currently running system, you'll probably have better luck with more conventional security measures.
Frank
Mirsad Todorovac wrote:
In fact, this came to my mind in few considerations too: I was thinking of new breed of pointers that would allow for memory map defragmentation. As said somewhere in theoretical books, after millions of seemingly random allocations/deallocations - memory looks completely fragmented, and wtih equally distributed holes and used spaces. This as you must have noted means that actual system memory used is at least twice the malloc() allocated memory, since the holes cannot be defragmented or compacted.
This is why I would like to propose "registered pointers" (maybe it came subconsciously from Java?) and memory allocation library that would have a better heap use and corruption detection.
Registered pointers are close in semantics to pointers with ranges already intorduced on the list, but they should also allow defragmentation of the memory on-the-fly, compacting of free space and deallocating and returning ampped pages back to system (unlike current memory allocators that behave like my country's budget: they only grow and become less efficiently used with time ...).
Is that too difficult to implement?
(I am sorry if this has been discussed already.)
To explain the idea more visibly, as we are all probably tired at 4 PM:
Every pointer would be registered in a pointer table or list *depending on implementation* Pointers would be indirect, used through a mechanism that would
allow transparent moving of memory fragment pointed to.
A garbage collection mechanism would browse through all pointers
and defragment memory, much like a file system, moving used memory towards beginning, free memory towards end of memory mapping, so it could be later munmap()-ed and returned to OS when certain LOW_WATERMARK or threshold is reached
Unused fragments of memory that are not explicitly free()-ed could
be deallocated automatically and an optional warning could be generated
Safe garbage collection memory allocator could and probably should
be mapped in separate address range from "normal" heap memory mapping that permits direct use and hacking of pointers
Better heap use would allow for introducing "heap canary" technique that would eliminate another source of exploits All could be done transparent, without changing existing code Better heap integirty checking would be introduced to existing programs, detecting memory leaks as warnings
That's the general idea, but I haven't tried implementation yet.
Compatible programs could simply be linked with new library and have the advantage of better memory use, better heap integrity and possibly longer uptime without need to reboot system because of memory leaks because of fragmentation.
I went far from original question, but this idea is tempting me for a long time, and GPC development team seems open enough :-) that I thought of requesting implementation or putting it on the wish list.
See if <http://developer.apple.com/documentation/Performance/Conceptual/ ManagingMemory/Articles/FindingLeaks.html> and <http://developer.apple.com/documentation/Performance/Conceptual/ ManagingMemory/Articles/MallocDebug.html> already do part of what you have in mind (assuming you are running gpc on Mac OS X).
Regards,
Adriaan van Os