It seems there is an off by one error with the "warning: unused variable" warning. Probably the line number is not saved when the variable identifier token is read, but instead the line following the entire variable declaration is used. Yep, declaring a typeless record supports this theory.
program testpas11;
procedure test; var a: Integer; b: record x,y: Integer; end; c: Integer; begin WriteLn( 'Hello' ); end;
begin end.
% gpc -Wall -c testpas11.pas testpas11.pas: In procedure `test': testpas11.pas:10: warning: unused variable `c' testpas11.pas:9: warning: unused variable `b' testpas11.pas:6: warning: unused variable `a'
1 program testpas11; 2 3 procedure test; 4 var 5 a: Integer; 6 b: record 7 x,y: Integer; 8 end; 9 c: Integer; 10 begin 11 WriteLn( 'Hello' ); 12 end; 13 14 begin 15 end.
It would be nicer if it recorded the identifier's line number when the token was first read - I'm not sure how hard this might be though (might be easy too). Normally GPC is pretty good with line numbers...
Enjoy, Peter.
Peter N Lewis wrote:
It seems there is an off by one error with the "warning: unused variable" warning.
There are many off by one [token] problems with messages. I think I know how to fix them now, but it will require a new Bison version. Even worse, it seems to require a Bison bugfix first, and I don't know how long this will take. So it might get even somewhat worse in the next GPC release, but hopefully correct some time later ...
Frank
Frank Heckenbach wrote:
Peter N Lewis wrote:
It seems there is an off by one error with the "warning: unused variable" warning.
There are many off by one [token] problems with messages. I think I know how to fix them now, but it will require a new Bison version. Even worse, it seems to require a Bison bugfix first, and I don't know how long this will take. So it might get even somewhat worse in the next GPC release, but hopefully correct some time later ...
It may be interesting to note that GCC is shifting to a hand-crafted recursive-descent parser.
"Mark Mitchell of CodeSourcery has contributed a new, hand-crafted recursive-descent C++ parser sponsored by the Los Alamos National Laboratory. The new parser is more standard conforming and fixes many bugs (about 100 in our bug database alone) from the old YACC-derived parser."
See http://gcc.gnu.org/ml/gcc/2000-10/msg00573.html.
I remember friends telling me that hand-crafted recursive-descent parsers were "old fashioned", parser generators were the "real thing" ... Times may change ...
Regards,
Adriaan van Os
On Wednesday, July 16, 2003, at 02:23 PM, Adriaan van Os wrote:
"Mark Mitchell of CodeSourcery has contributed a new, hand-crafted recursive-descent C++ parser sponsored by the Los Alamos National Laboratory. The new parser is more standard conforming and fixes many bugs (about 100 in our bug database alone) from the old YACC-derived parser."
See http://gcc.gnu.org/ml/gcc/2000-10/msg00573.html.
I remember friends telling me that hand-crafted recursive-descent parsers were "old fashioned", parser generators were the "real thing" .... Times may change ...
YACC, is that not a species of Cow?
Adriaan van Os wrote:
Frank Heckenbach wrote:
Peter N Lewis wrote:
It seems there is an off by one error with the "warning: unused variable" warning.
There are many off by one [token] problems with messages. I think I know how to fix them now, but it will require a new Bison version. Even worse, it seems to require a Bison bugfix first, and I don't know how long this will take. So it might get even somewhat worse in the next GPC release, but hopefully correct some time later ...
It may be interesting to note that GCC is shifting to a hand-crafted recursive-descent parser.
"Mark Mitchell of CodeSourcery has contributed a new, hand-crafted recursive-descent C++ parser sponsored by the Los Alamos National Laboratory. The new parser is more standard conforming and fixes many bugs (about 100 in our bug database alone) from the old YACC-derived parser."
See http://gcc.gnu.org/ml/gcc/2000-10/msg00573.html.
I remember friends telling me that hand-crafted recursive-descent parsers were "old fashioned", parser generators were the "real thing" ... Times may change ...
I don't think so. IMHO, various people/companies have made their bad experiences with hand-crafted RD parsers. (E.g. Borland, as can be seen from some ambiguities in the BP language. Also I heard Stroustroup has later said that C++ would have had less ambiguities if he had used a generated parser during development.)
Indeed, there are some parse conflicts and even a few ambiguities in the Pascal dialects we support that are very hard to impossible to support in LALR(1). E.g., the dialect-specific keywords (if we don't want to require them to be enabled/disabled explicitly, cf. the mails about that here some months ago), or BP's use of `=' for typed constant initializers (which we currently solve by lexer tricks), or expressions as lower bounds of subrange types (which is really hard, and doesn't work yet).
However, LALR(1) is not the last word in parser generator technology. Recent versions of Bison support so-called GLR parsers (also known as Tomita parsers) which seem capable of resolving all of these conflicts. I'm already experimenting with it, and it looks quite good so far (though I may not use it in the next GPC release, but that's due to the location problems (cf. my reply to Peter's mail) which are mostly unrelated to GLR).
I'm not sure exactly what motivated the g++ people to switch to a hand-crafted parser -- well, if 2000-10 means the date, it was quite some time before Bison supported GLR, ok. And perhaps C++ is too ambiguous even for GLR to work well, I don't know (-; though the GLR example in the Bison manual is in fact about C++).
But I think, especially for GPC, where we're adding new syntax rules (to please various dialects) quite often, it would be foolish to use a hand-crafted parser because we'd have to manually check for any possible conflicts and ambiguities -- and that after each change. (To me, that's the main advantage of generated parsers. Leaving efficiency etc. aside (which doesn't seem to play a major role, anyway), the parser generator will detect conflicts and therefore warn us when we're trying to add a syntax which could lead to ambiguities, cf. the examples on top).
Frank
However, LALR(1) is not the last word in parser generator technology. Recent versions of Bison support so-called GLR parsers (also known as Tomita parsers) which seem capable of resolving all of these conflicts.
((To me, that's the main advantage of generated parsers. Leaving efficiency etc. aside (which doesn't seem to play a major role, anyway),
Are there going to be performance penalties? Currently, GPC is many times slower than CW (probably a factor of 10 to 50 times slower, I haven't measured it), so it certainly would be undesirable to slow it down!
That said, I imagine not much work has been done in to finding where the bottle necks are - has anyone done much in the way of profiling? Peter.
Peter N Lewis wrote:
However, LALR(1) is not the last word in parser generator technology. Recent versions of Bison support so-called GLR parsers (also known as Tomita parsers) which seem capable of resolving all of these conflicts.
((To me, that's the main advantage of generated parsers. Leaving efficiency etc. aside (which doesn't seem to play a major role, anyway),
Are there going to be performance penalties? Currently, GPC is many times slower than CW (probably a factor of 10 to 50 times slower, I haven't measured it), so it certainly would be undesirable to slow it down!
Did anyone suggest to slow it down?
That said, I imagine not much work has been done in to finding where the bottle necks are - has anyone done much in the way of profiling?
Have you? Otherwise, starting such a discussion is somewhere between pointless and FUD ...
FTR, (a) IME most time is spent in code generation, even without optimizing, (b) AFAIK GLR parsers behave the same as LALR(1) in the cases the latter can handle, so the difference is only in the other cases between not being able to parse it at all and parsing it with perhaps a little more effort, and I don't see how RD parsers would do with essentially less effort.
Frank
((To me, that's the main advantage of generated parsers. Leaving efficiency etc. aside (which doesn't seem to play a major role, anyway),
Are there going to be performance penalties? Currently, GPC is many times slower than CW (probably a factor of 10 to 50 times slower, I haven't measured it), so it certainly would be undesirable to slow it down!
Did anyone suggest to slow it down?
"leaving efficiency aside" suggests it may be slow (at least to me).
That said, I imagine not much work has been done in to finding where the bottle necks are - has anyone done much in the way of profiling?
Have you? Otherwise, starting such a discussion is somewhere between pointless and FUD ...
It was just a question - has anyone done much in the way of profiling? If the answer is no, then perhaps one day I'll look at doing it. If the answer is yes, then there is presumably some available data - that's why I asked. If no one has ever profiled it, then it's quite possible that a quick profile could double the speed (at least that is my experience with un-profiled code, 2-10 times is not uncommon).
FTR, (a) IME most time is spent in code generation, even without optimizing,
I suspect this is correct because the Mac OS combined API file parses very quickly (in fact several times faster than CodeWarrior!). And that does not generate any code, just the .gpi file (except the init routine of course).
(b) AFAIK GLR parsers behave the same as LALR(1) in the cases the latter can handle, so the difference is only in the other cases between not being able to parse it at all and parsing it with perhaps a little more effort, and I don't see how RD parsers would do with essentially less effort.
Fair enough. Peter.
Peter N Lewis wrote:
((To me, that's the main advantage of generated parsers. Leaving efficiency etc. aside (which doesn't seem to play a major role, anyway),
Are there going to be performance penalties? Currently, GPC is many times slower than CW (probably a factor of 10 to 50 times slower, I haven't measured it), so it certainly would be undesirable to slow it down!
Did anyone suggest to slow it down?
"leaving efficiency aside" suggests it may be slow (at least to me).
For me it means that I do (did) not want to discuss it here (because it probably is irrelevant, and because probably nobody has any real facts -- for details about parsing algorithms and some efficiency considerations, you might want to read http://www.cs.vu.nl/pub/dick/PTAPG/).
My problem is that often people start discussions about efficiency without any real basis, replacing facts by opinions on what "ought to" be faster, and (almost) always without any actual [meaningful] measurements. (The only time I remember at all someone doing a systematical benchmark was Waldek some months ago about packed types -- though it was only for a special case which probably wasn't representative.) Maybe I take this a little personal because I've often had to deal with crappy code for the sake of efficiency (in GPC and in other projects) which almost always turned out to be not much faster (sometimes even slower) and/or irrelevant for overall efficiency, and causing a lot of extra work for those who wrote the code and those (usually me) who removed the code and replaced it by saner code.
That said, I imagine not much work has been done in to finding where the bottle necks are - has anyone done much in the way of profiling?
Have you? Otherwise, starting such a discussion is somewhere between pointless and FUD ...
It was just a question - has anyone done much in the way of profiling? If the answer is no, then perhaps one day I'll look at doing it. If the answer is yes, then there is presumably some available data - that's why I asked. If no one has ever profiled it, then it's quite possible that a quick profile could double the speed (at least that is my experience with un-profiled code, 2-10 times is not uncommon).
As I said, it's probably the code-generation which takes most of the time. This is a backend issue, so you might want to ask on the GCC list. (I suppose some have done profiling there, but I don't know who, where or what ...)
FTR, (a) IME most time is spent in code generation, even without optimizing,
I suspect this is correct because the Mac OS combined API file parses very quickly (in fact several times faster than CodeWarrior!). And that does not generate any code, just the .gpi file (except the init routine of course).
Good to know, anyway. This means that my recent (2002-12) optimizations in the GPI area seem work as intended. (There were some actual bottlenecks (O(n^2) instead of O(n)) which I could actually measure ...)
Frank
Adriaan van Os wrote:
Frank Heckenbach wrote:
Peter N Lewis wrote:
It seems there is an off by one error with the "warning: unused variable" warning.
There are many off by one [token] problems with messages. I think I know how to fix them now, but it will require a new Bison version. Even worse, it seems to require a Bison bugfix first, and I don't know how long this will take. So it might get even somewhat worse in the next GPC release, but hopefully correct some time later ...
It may be interesting to note that GCC is shifting to a hand-crafted recursive-descent parser.
"Mark Mitchell of CodeSourcery has contributed a new, hand-crafted recursive-descent C++ parser sponsored by the Los Alamos National Laboratory. The new parser is more standard conforming and fixes many bugs (about 100 in our bug database alone) from the old YACC-derived parser."
See http://gcc.gnu.org/ml/gcc/2000-10/msg00573.html.
I remember friends telling me that hand-crafted recursive-descent parsers were "old fashioned", parser generators were the "real thing" ... Times may change ...
The "new thing" is actually recursive-descent parser generators, since they have the debugging benefits of recursive-descent parsers with the maintenance benefits of autogenerated parsers. See for instance ANTLR and JavaCC. But nobody's written one (yet) which can be used for applications coded in C, such as the C++ compiler.
C++ also turns out to be a horrible language to parse; it requires arbitrary (infinite) lookahead and so can't be implemented straightforwardly in most parser generators at all. The new parser uses the techique of 'tentative parsing', then deciding whether to keep the 'tentative parse' or try parsing a different way, and it nests it layers deep. There's no parser generating tool out there that can do *that* cleanly. :-/
Certainly parser generators are easier to maintain than hand-coded parsers -- for languages which they can parse. C++ happens not to be one of those languages.