Re: Quo vadis, GPC?

2 Aug 2010


      Kevan Hashemi hashemi@brandeis.edu wrote:
...
I don't want to take it out of context, that's why I'd like to read the
forum e-mails myself, to see who it pushing for self-inflicted mutilation of
their own code.
Well, the gcc mailing list has a huge amount of traffic and I do not
remember what the subject of this thread was nor even when it was. All
I can tell you is that it was a very large discussion with strong
opinions on both sides. The only thing I could find is this one:
http://gcc.gnu.org/ml/gcc/2008-01/msg00349.html
but that is only a one-line paragraph on status. You'd have to search
further back in time to find the actual thread.
As you can see from this status message, Richard Stallman blocked the
introduction of plugins at the time because of concerns that it would
allow the GPL to be circumvented.
...
So, I've been looking through the LLVM pages. I'm new to this
compiler-creation business, but it looks to me like the following steps are
required to produce a self-compiling GPC on LLVM.
We first have to write a partial-syntax Pascal lexer and parser to create
the abstract syntax tree (AST) that is required by the LLVM libraries. We
can't use Bison to create the lexer and parser because Bison's output is a
C-program. We have to write the lexer and parser from scratch in Pascal.
We compile the partial-syntax lexer and parser with the existing GCC-based
GPC. We link our AST structure to the LLVM libraries. The result is a
program that takes text input and produces LLVM Intermediate Representation
(IR) code. The LLVM system appears to handle optimization and
platform-dependent assembly from there.
Having tested the minimal Pascal compiler, we enhance its lexer and parser
until it supports all the syntax that is used in its own definition. We are
still compiling our Pascal with the GCC version of GPC.
Now we compile the lexer and parser with itself. Our lexer-parser is
entirely written in Pascal, but there are C-like parts where we link to the
LLVM libraries.
From here, we continue to enhance the compiler in steps. Each step requires
defining a new feature without the use of the new feature. Frank says he
does not like that, but I'm fine with it.
I will continue looking into the matter, in particular I'd like to learn
more about the state of LLVM's optimisers, and hear from some existing users
about how well it all works on different platforms. Also, I'd like to know
how settled the library interface is, because it's here that we had problems
with GCC, if I understand correctly.
So, please correct my mistakes. I may have misunderstood the process
entirely.
A few things ..
There is no reason why you couldn't use a parser generator as a base
for a compiler that targets LLVM.
As I had mentioned earlier, there are two ways to target LLVM. One is
a C++ API for which C bindings exist as well. The other is to generate
LLVM pseudo-assembly (text files).
However, if you want your compiler front end to be written in Pascal,
I would recommend against bison. The most natural way to write a
Pascal compiler would be to write an RD parser by hand. RD parsers
have the advantage that it is possible to understand what they do by
reading the source code. For each rule in the grammar there is a
corresponding Pascal function/procedure. Table driven parsers have
their logic in data tables and therefore it is very difficult to
figure out what is happening by reading the code, you really need a
debugger to get an understanding.
A project looking for newcomers to compiler construction to join the
team is probably better off with an RD parser for the reasons
mentioned above, but also RD parsers are generally more efficient and
produce better error messages.
I would recommend a book by Per Brinch Hansen titled "Brinch Hansen on
Pascal Compilers". It handles RD parsing of Pascal and includes source
code and commentary for a Pascal-subset compiler.
Now, if you do want to develop a Pascal compiler targeting LLVM, I
would suggest to find a new name for the product. Not only would it be
confusing to most people if a GNU Pascal didn't target GCC but LLVM
instead, but a different name would likely allow you to take advantage
of LLVM's popularity when recruiting new developers to the project.
Amongst LLVM developers/users, some have on occasion talked about the
possibility of an Algol-family counterpart to Clang. Although thus far
nobody has taken this idea anywhere, the nicknames people have given
it half-seriously and half-jokingly (Alang, with A for Algol or Ada,
or Wlang with W for Wirth and others) would seem to indicate that
there is enough love for such an idea to take advantage of.
If I was to start a Pascal compiler project targeting LLVM, I would
have "Plang" high on my list of desirable names for the project as it
is likely this would help find new friends within the LLVM crowd ;-)
GNU Pascal on the other hand might confuse potential contributors away
as they probably think its a GCC front end. You'd always have to
explain "Yes, we're GNU but we don't do GCC, we do LLVM". That sort of
thing is usually unhelpful.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

Re: Quo vadis, GPC?