Hi, all,
(Well, former message had a typing error in it's title as you must have noticed - which shows what quirks might occur when being eager to speed things up ;-)
There's more space for speed/efficiency improvements.
The second issue (I thank my friend Dvae who pointed similar issue years ago) is reading of input files.
When done with open(2)/read(2) interface, it's correct, but mapping it into memory with mmap(2) is only slightly different, but much more efficient. This is because kernel does things for you.
I can't give you a direct estimation about how switch to mmap(2) interface would speed-up GPC, but I remember an application (oops, I forgot to add it to my CV) when we had to read complete registry of Croatian companies in order to do a search. In this simplistic case with one big file, mmap(2)-ing it instead of open(2)/read(2)-ing it has shown _50%_drop!!!!_ in use of 'system time'. 'User time' also dropped, because interface routines didn't have to copy records to memory locations allocated with malloc - they were already mmap(2)-ed in, I only needed to re-link pointers to them.
Off course, several routines had to be modified to be read-only to avoid copy-on-write overhead over input file, and to remove dependencies from '\0' trailing byte. At the time, it seemed as an acceptable price.
I can't estimate the amount fo work required to attempt to implement mmap(2) interface to read files (on platforms where available) seemlesly. But from 'Internals', since parser calls lexer, mostly only lexer would have to be modified - and have two versions (one for platforms with MMAP and one for others, selected etiher with -DUSE_MMAP compile directive when compiling GPC, or even at runtime).
The change would, however, significantly decrease copying overhead when dealing with large files. Also this could be applied to generating GPI files, but that area I'm not certain about yet.
Please give your valuable opinions - thanks, Mirsad
-- This message has been made up using recycled ideas and language constructs. No plant or animal has been injured in process of making this message.
Mirsad Todorovac wrote:
I was puzzled by 'Highlights' section of GPC manual that says how GNU Pascal compiler is substantially slower. I know of course that it uses proven GNU backend, so GPC team has little influence in making *that* part run faster.
Exactly.
I know that reliability is more important than speed, but a question occured to me. While I was still at the college, we had an collegium called "Language Processors". It had subjects like LL-grammar, attribute grammar etc., which are mathematical background for compilers.
On labs, we had a choice, whether to use code parser, or a 'finite stata automata' (FSA) driven parser. I chose the latter which made me a lot of complications; but impression of all students was that FSA parser is substantially faster.
Why is this - FSA parser is very small piece of code driven by a table that represents it's wisdom and logic, which serves to approve or reject syntax.
I realized GPC is using 'bison' parser as it's front end - and 'bison' (yacc successor) AFAIK uses code-driven parsing generated from attribute LALR(1) grammar, doesn't it?
I don't think so. But to make sure we're not in confusion about terms:
As far as I've learned it, two basic types of parsers are top-down and bottom-up. The former starts (at runtime) with the highest building blocks and goes down to smaller blocks. This is typically (in fact, it's the only way I'm aware of right now) implemented using recursive routines which do some look-ahead as needed to decide which routines to call for their sub-elements.
Bottom-up parsers read token by token and build up higher elements when possible. In cases of doubt whether to read ("shift") another token or to build ("reduce") to a nonterminal, they use look-ahead (limited to one token in LALR(1)) and possibly conflict resolution rules (ideally there should be few or none necessary in a good grammar).
Bison is an example of the latter. It creates code and tables for a stack machine, i.e., a FSA plus a dynamic token stack (a FSA without stack can parse regular expressions, with stack it can do context-free grammars). The lots of C code you see in GPC's parse.y is not actually part of the parser, but it's the "semantic actions". (In fact I think that most of it really belongs to subroutines in other files, but that's another issue.) You can see the generated stack machine with its tables in parse.c.
FWIW, I strongly prefer bottom-up myself, mostly because of bad experiences with top-down parsers. In particular, since they're usually hand-written, and humans tend to make mistakes and overlook things, they often contain unresolved ambiguities (while bison finds ambiguities automatically). E.g., Borland Pascal apparently uses a hand-written top-down parser (although their code is closed, so we can't know for sure, but there are some indications), and they added some syntactic "extensions" to the language which introduced some strong ambiguities, apparently without noting the problems themselves. I've also heard that Stroustrup later regretted having used a TD parser when developing C++, otherwise the language would have contained less ambiguities. And bison has saved us in GPC several times ...
What is your estimate, is there a chance of leap increase of speed if switching bison to use FSA-syntax checker? So he would generate part of compiler that does the same grammar, but by different method, finite state automata (FSA). FSA parsers are known to be more compact, but harder to design.
Even if it were not already one (provided we mean the same thing), I wouldn't think so, simply because the parser (and lexer, preprocessor, compiler driver and assembler FTM) don't take much of the total time at all. To get a rough impression, you might want to try `--syntax-only' (but this actually does some more, in particular the "semantic actions", so the real fraction of preprocessing+lexing+parsing time is even smaller).
Most of the time is consumed by the code generator (especially when optimizing, but I think also when not).
Besides that, grammar can be spearated from generated code, so (in some ideal case) there would be no need to recompile driver part of the compiler, only compiler table - when changing or debugging grammar.
You mean the recompilation of parts of GPC, not of Pascal programs, don't you? Well, I'm not 100% sure, but I think the driver part in parse.c is relatively small and quickly compiled. What takes more time is the large `case' sequences generated by bison (I think GCC and most other compilers are not particularly well adapted to such kind of code), and these do change when we change the grammer. But as I said above, I'd prefer to move most of the code, and this should make compilation of parse.c a little faster. (But quite frankly, the compilation time is not that long on modern machines, so this argument is not very relevant to me. I'd want to do it mostly for other reasons, such as getting a cleaner parser that can possibly even be shared by other projects that want to parse Pascal code, e.g., syntax sensitive editors.)
My estimation that this change could help reduce compile time by 50% to 70%;
You mean the parse time alone, or really the total compile time? If the latter, I'd like to know what you base your estimate on.
and decrease compiler binary size (parser part) by 30-60%,
As I've said on other occasions, binary size (on the order of a few MB for such a program) is a complete non-issue to me. Sorry. ;-)
switching syntax (by compiler directive) would then be a switch of table,
I don't think you can easily switch tables in the middle of parsing. The state you're in might not even have a direct counterpart in the other table.
But I also think that's not a major issue. Syntactic differences between dialects are the (non-)availability of some keywords (which is handled on the lexer level, rather easily) and some constructs which are valid only in some dialects. Those are parsed as normal, and then a warning or an error is given, depending on the current dialect. This additional dialect check in the actions doesn't take any significant amount of time.
This change doesn't influence RTL generator and back-end, neither lexer front-end (but there's a room for finite-state automata strategy improvements there also).
GPC's current lexer is hand-made and ugly (IMHO). I'll replace it by a flex generated (FSA) lexer sometime (I've written much of it already, but some things are left to do, and since the current lexer works, it's not a top priority). But, as I said, it's not for reasons of speed because the lexer is rather insignificant, anyway.
Now, these are only reflections, I haven't sunk that deep in 'bison' nor GPC internals - but please tell me that this line of research have a sense or not.
I fear not. But there are other places to improve ...
There's more space for speed/efficiency improvements.
The second issue (I thank my friend Dvae who pointed similar issue years ago) is reading of input files.
When done with open(2)/read(2) interface, it's correct, but mapping it into memory with mmap(2) is only slightly different, but much more efficient. This is because kernel does things for you.
I can't give you a direct estimation about how switch to mmap(2) interface would speed-up GPC, but I remember an application (oops, I forgot to add it to my CV) when we had to read complete registry of Croatian companies in order to do a search. In this simplistic case with one big file, mmap(2)-ing it instead of open(2)/read(2)-ing it has shown _50%_drop!!!!_ in use of 'system time'. 'User time' also dropped, because interface routines didn't have to copy records to memory locations allocated with malloc - they were already mmap(2)-ed in, I only needed to re-link pointers to them.
In general I agree that mmap can be faster. But I'd expect a 50% drop only in very special circumstances, such as when you'd otherwise read very small blocks (or even single bytes), or seek a lot. If no seeking is done, and you're reading blocks of several KB (which most buffered file libraries do by default), I usually only see a very small difference.
I can't estimate the amount fo work required to attempt to implement mmap(2) interface to read files (on platforms where available) seemlesly. But from 'Internals', since parser calls lexer, mostly only lexer would have to be modified - and have two versions (one for platforms with MMAP and one for others, selected etiher with -DUSE_MMAP compile directive when compiling GPC, or even at runtime).
(If anything, it must be at runtime, since mmap cannot be assumed to succeed. See chapter `Writing C', section `Mmap' in the GNU C coding standards (standards.{texi,info}).)
The change would, however, significantly decrease copying overhead when dealing with large files. Also this could be applied to generating GPI files, but that area I'm not certain about yet.
Indeed, there are several areas:
- GPI files. This would have been a good place to use mmap because they're mostly read in very small parts (a few bytes), with very much seeking in between. "Unfortunately", we realized this problem some years ago and solved it by adding a memory buffer which is read in from the file in (a) large block(s). Now we (or you ;-) could add mmap there (module.c: mopen_read()), but I think it would make the code more complicated (since you'd have to leave the current code there in case mmap fails) whereas the benefit might not even be measurable (you basically only save some copying around which is quite fast compared to the other things that are done while reading GPI files).
When writing GPI files, there's (almost) no seeking involved, so the normal buffering of C files does its job there.
- Lexer. Well, it doesn't seek(*), so again the buffering will do its job. But maybe when I've switched to the flex lexer, I'll try mmap just for curiosity (not much point trying it with the current code which is to be replaced).
(*) Actually, some years ago it did seek, and I thought it was a major performance issue. We eliminated the seek, and the speed gain was neglectible ...
- Runtime. There is MemoryMap in the GPC unit, and I'm using it in some of my Pascal code where it gives real benefits in fact. (Just to say that I'm not opposed to mmap in general. ;-)
About other speed issuess: Some of the performance problems with (large) GPI files are due to bad code which has O(n^2) behaviour where O(n) or at least O(n log n) would be possible. I've eliminated some of them recently, but as long as one remains, it's still O(n^2), of course. (I've had a short conversation about this with Peter recently, if you like I can send you a copy; it's in German so I won't post it here ;-).
That's for GPI files. There might be other O(n^2) cases elsewhere, though I don't expect too many of them (at least in the Pascal front-end; I haven't looked much through the back-end code, but I hope they don't do such things there ;-).
Frank