Frank Heckenbach wrote:
So why do claim you need a debugger for understanding?
There is a difference between understanding how to feed input into a tool and understanding what the output coming out of the tool is doing.
There are a number of reasons why it is generally a good idea for contributors to be able to figure out what the code is doing, not only knowing how it is used. One example where this is important in a compiler is implementing error-handling and recovery so that meaningful error messages can be generated and also to avoid phantom errors that aren't really there but only appear as a result of a previous error that threw the parser off.
The reality is that you have used this tool for 10 years or longer and the base from which to possibly recruit one or more new maintainers is unlikely to have this experience.
Except, as I wrote initially, the grammar is exactly one of those areas where we wouldn't need to spend much work, because it exists and works already. Even if the semantic actions have to be rewritten, and even in the Bison parser code has to be translated to Pascal at some point, the grammar rules and their logic are not affected.
You will find that I was pretty much the only one in this entire discussion who made a plea to not change anything but find new maintainers to keep GNU Pascal going as a GCC front end, then let the new maintainers decide if they want to make any changes to the implementation or not.
In fact considering the FSF's guidelines on the use of the GNU moniker, it is quite possible that the FSF might withdraw their authorisation to call the project GNU Pascal if it abandoned GCC in favour of LLVM.
Nevertheless, despite my plea for keeping GNU Pascal the way it is so that there continues to be a Pascal front end for GCC, some people here did want to explore other routes, so I responded to their questions.
The two things that came up the most were 1) can the compiler be written in Pascal and 2) can the compiler target LLVM without having to link directly to any API that when changed may break it.
My comments were specifically targeted at these questions and I was also specifically taking into account that any such undertaking would require the recruitment of new contributors, probably from a pool of people who don't have much if any experience with compiler contruction.
I think you will find that GIVEN THE AFOREMENTIONED CONSTRAINTS, my recommendations are measured and appropriate.
I do understand however, that your comments are geared towards a different scenario that doesn't necessarily involve new recruits, as you seem to be mostly interested to simply add a C++ target to the existing compiler and let the GCC back end alone until perhaps some day a new maintainer shows up who might want to update it. Since you are already familiar with your own code, if you are yourself doing the work adding a C++ back end, there is of course no imminent need for rewrites.
Yet, my comments were not targeted at that scenario. Instead I was responding to Kevan's LLVM scenario questions.
Moreover, in recent years the trend has been to move away from yacc/bison and move towards RD and PEG. There is an entire generation of newer tools that build RD parsers, both conventional and memoising, such as ANTLR for example.
How powerful are they? GPC with its mix of dialects needs even more than LALR(1), i.e. Bison's GLR parser.
ANTLR does LL(*) which means infinite (as in arbitrary) lookup. I'd be surprised if GPC's grammar could not be expressed in LL(*).
Note that Clang uses a handwritten RD parser even for the C++ implementation which I believe is based on LL(*) too. The often recited statement that LL is not powerful enough to implement "real languages out there" is just a myth.
The benefit of RD parsers are there. People smarter than you and me will confirm that. Niklaus Wirth is a strong proponent of hand coded RD parsers. Moessenboeck changed his seminal work (COCO) from LALR to LL. Tom Pittman has also been on record in favour of RD.
And there are strong proponents amongst today's generation of highly acclaimed scholars, for example Terence Parr and Chris Lattner. Chris Lattner has just received an ACM award for his work on Clang and LLVM. So you can make fun of me for my preference of hand written RD all day long, but I doub't you have enough clout to make fun of Chris Lattner who is also an outspoken proponent of hand written RD parsers.
People who write RD parsers by hand generally calculate FIRST and FOLLOW sets and proof that their grammar contains no ambiguities. It seems you have run into somebody who didn't do that but that doesn't mean that this is how its done.
Borland apparently didn't prove their grammars (because their grammar did contain ambiguities).
Well, shame on them.
A year ago we had an Indian or Bangladeshi neighbour who always parked his bicycle in my spot which got me into trouble with the landlord because I had to bring my bicycle into the building. It doesn't mean all Indians/Bangladeshis do this. It was just this one guy and he was wrong.
I still don't quite see the point. With RD, you specify the grammar in a formal way and write the parser manually, and it's verified automatically. With Bison you specify the grammar formally, and the parser is generated and verified automatically.
If we had wanted to use a generator, we'd have either used COCO/R or ANTLR, both of which generate human readable RD parsers in several output languages. COCO can generate output in Pascal, Modula-2 and Oberon and others. ANTLR can generate output in Java, C, C++, Python, Oberon and others.
However, our compiler is meant to be an entirely self contained bootstrap kit. We wanted to make it as easy as possibly for anyone to bootstrap the compiler anywhere without having to worry what libraries or tools might be there and whether or not they are up to date. Our bootstrap compiler has no dependencies other than a C compiler and stdio/stddef.
The former seems more work to me.
Appearances can be deceiving. The time spent coding the parser is only a fraction of the work you have to do on things a generator won't do for you anyway.
In any event, the summary of my recommendations was and still is this:
1) Continue GPC as a GCC front end, don't change it, try to find a new generation of maintainers willing to update to newer versions of GCC as GCC progresses.
2) If you really must start a new Pascal project targeting LLVM, use an RD parser written in Pascal and generate LLVM pseudo-assembly, don't call it GPC then, use a name that avoids confusion and is likely to give you an advantage finding new contributors (riding atop the LLVM buzz).