Re: Quo vadis, GPC?

3 Aug 2010


      Frank Heckenbach wrote:
...
So why do claim you need a debugger for understanding?
There is a difference between understanding how to feed input into a
tool and understanding what the output coming out of the tool is
doing.
There are a number of reasons why it is generally a good idea for
contributors to be able to figure out what the code is doing, not only
knowing how it is used. One example where this is important in a
compiler is implementing error-handling and recovery so that
meaningful error messages can be generated and also to avoid phantom
errors that aren't really there but only appear as a result of a
previous error that threw the parser off.
...
...
The reality is that you have used this tool for 10 years or longer and
the base from which to possibly recruit one or more new maintainers is
unlikely to have this experience.
Except, as I wrote initially, the grammar is exactly one of those
areas where we wouldn't need to spend much work, because it exists
and works already. Even if the semantic actions have to be
rewritten, and even in the Bison parser code has to be translated to
Pascal at some point, the grammar rules and their logic are not
affected.
You will find that I was pretty much the only one in this entire
discussion who made a plea to not change anything but find new
maintainers to keep GNU Pascal going as a GCC front end, then let the
new maintainers decide if they want to make any changes to the
implementation or not.
In fact considering the FSF's guidelines on the use of the GNU
moniker, it is quite possible that the FSF might withdraw their
authorisation to call the project GNU Pascal if it abandoned GCC in
favour of LLVM.
Nevertheless, despite my plea for keeping GNU Pascal the way it is so
that there continues to be a Pascal front end for GCC, some people
here did want to explore other routes, so I responded to their
questions.
The two things that came up the most were 1) can the compiler be
written in Pascal and 2) can the compiler target LLVM without having
to link directly to any API that when changed may break it.
My comments were specifically targeted at these questions and I was
also specifically taking into account that any such undertaking would
require the recruitment of new contributors, probably from a pool of
people who don't have much if any experience with compiler
contruction.
I think you will find that GIVEN THE AFOREMENTIONED CONSTRAINTS, my
recommendations are measured and appropriate.
I do understand however, that your comments are geared towards a
different scenario that doesn't necessarily involve new recruits, as
you seem to be mostly interested to simply add a C++ target to the
existing compiler and let the GCC back end alone until perhaps some
day a new maintainer shows up who might want to update it. Since you
are already familiar with your own code, if you are yourself doing the
work adding a C++ back end, there is of course no imminent need for
rewrites.
Yet, my comments were not targeted at that scenario. Instead I was
responding to Kevan's LLVM scenario questions.
...
...
Moreover, in recent years the trend has been to move away from
yacc/bison and move towards RD and PEG. There is an entire generation
of newer tools that build RD parsers, both conventional and memoising,
such as ANTLR for example.
How powerful are they? GPC with its mix of dialects needs even more
than LALR(1), i.e. Bison's GLR parser.
ANTLR does LL(*) which means infinite (as in arbitrary) lookup. I'd be
surprised if GPC's grammar could not be expressed in LL(*).
Note that Clang uses a handwritten RD parser even for the C++
implementation which I believe is based on LL(*) too. The often
recited statement  that LL is not powerful enough to implement "real
languages out there" is just a myth.
The benefit of RD parsers are there. People smarter than you and me
will confirm that. Niklaus Wirth is a strong proponent of hand coded
RD parsers. Moessenboeck changed his seminal work (COCO) from LALR to
LL. Tom Pittman has also been on record in favour of RD.
And there are strong proponents amongst today's generation of highly
acclaimed scholars, for example Terence Parr and Chris Lattner. Chris
Lattner has just received an ACM award for his work on Clang and LLVM.
So you can make fun of me for my preference of hand written RD all day
long, but I doub't you have enough clout to make fun of Chris Lattner
who is also an outspoken proponent of hand written RD parsers.
...
...
People who write RD parsers by hand generally calculate FIRST and
FOLLOW sets and proof that their grammar contains no ambiguities. It
seems you have run into somebody who didn't do that but that doesn't
mean that this is how its done.
Borland apparently didn't prove their grammars (because their
grammar did contain ambiguities).
Well, shame on them.
A year ago we had an Indian or Bangladeshi neighbour who always parked
his bicycle in my spot which got me into trouble with the landlord
because I had to bring my bicycle into the building. It doesn't mean
all Indians/Bangladeshis do this. It was just this one guy and he was
wrong.
...
I still don't quite see the point. With RD, you specify the grammar
in a formal way and write the parser manually, and it's verified
automatically. With Bison you specify the grammar formally, and the
parser is generated and verified automatically.
If we had wanted to use a generator, we'd have either used COCO/R or
ANTLR, both of which generate human readable RD parsers in several
output languages. COCO can generate output in Pascal, Modula-2 and
Oberon and others. ANTLR can generate output in Java, C, C++, Python,
Oberon and others.
However, our compiler is meant to be an entirely self contained
bootstrap kit. We wanted to make it as easy as possibly for anyone to
bootstrap the compiler anywhere without having to worry what libraries
or tools might be there and whether or not they are up to date. Our
bootstrap compiler has no dependencies other than a C compiler and
stdio/stddef.
...
The former seems more work to me.
Appearances can be deceiving. The time spent coding the parser is only
a fraction of the work you have to do on things a generator won't do
for you anyway.
In any event, the summary of my recommendations was and still is this:
1) Continue GPC as a GCC front end, don't change it, try to find a new
generation of maintainers willing to update to newer versions of GCC
as GCC progresses.
2) If you really must start a new Pascal project targeting LLVM, use
an RD parser written in Pascal and generate LLVM pseudo-assembly,
don't call it GPC then, use a name that avoids confusion and is likely
to give you an advantage finding new contributors (riding atop the
LLVM buzz).

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

Re: Quo vadis, GPC?