Adriaan van Os wrote:
Frank Heckenbach wrote:
Peter N Lewis wrote:
It seems there is an off by one error with the "warning: unused variable" warning.
There are many off by one [token] problems with messages. I think I know how to fix them now, but it will require a new Bison version. Even worse, it seems to require a Bison bugfix first, and I don't know how long this will take. So it might get even somewhat worse in the next GPC release, but hopefully correct some time later ...
It may be interesting to note that GCC is shifting to a hand-crafted recursive-descent parser.
"Mark Mitchell of CodeSourcery has contributed a new, hand-crafted recursive-descent C++ parser sponsored by the Los Alamos National Laboratory. The new parser is more standard conforming and fixes many bugs (about 100 in our bug database alone) from the old YACC-derived parser."
See http://gcc.gnu.org/ml/gcc/2000-10/msg00573.html.
I remember friends telling me that hand-crafted recursive-descent parsers were "old fashioned", parser generators were the "real thing" ... Times may change ...
I don't think so. IMHO, various people/companies have made their bad experiences with hand-crafted RD parsers. (E.g. Borland, as can be seen from some ambiguities in the BP language. Also I heard Stroustroup has later said that C++ would have had less ambiguities if he had used a generated parser during development.)
Indeed, there are some parse conflicts and even a few ambiguities in the Pascal dialects we support that are very hard to impossible to support in LALR(1). E.g., the dialect-specific keywords (if we don't want to require them to be enabled/disabled explicitly, cf. the mails about that here some months ago), or BP's use of `=' for typed constant initializers (which we currently solve by lexer tricks), or expressions as lower bounds of subrange types (which is really hard, and doesn't work yet).
However, LALR(1) is not the last word in parser generator technology. Recent versions of Bison support so-called GLR parsers (also known as Tomita parsers) which seem capable of resolving all of these conflicts. I'm already experimenting with it, and it looks quite good so far (though I may not use it in the next GPC release, but that's due to the location problems (cf. my reply to Peter's mail) which are mostly unrelated to GLR).
I'm not sure exactly what motivated the g++ people to switch to a hand-crafted parser -- well, if 2000-10 means the date, it was quite some time before Bison supported GLR, ok. And perhaps C++ is too ambiguous even for GLR to work well, I don't know (-; though the GLR example in the Bison manual is in fact about C++).
But I think, especially for GPC, where we're adding new syntax rules (to please various dialects) quite often, it would be foolish to use a hand-crafted parser because we'd have to manually check for any possible conflicts and ambiguities -- and that after each change. (To me, that's the main advantage of generated parsers. Leaving efficiency etc. aside (which doesn't seem to play a major role, anyway), the parser generator will detect conflicts and therefore warn us when we're trying to add a syntax which could lead to ambiguities, cf. the examples on top).
Frank