Kevan Hashemi hashemi@brandeis.edu wrote:
I don't want to take it out of context, that's why I'd like to read the forum e-mails myself, to see who it pushing for self-inflicted mutilation of their own code.
Well, the gcc mailing list has a huge amount of traffic and I do not remember what the subject of this thread was nor even when it was. All I can tell you is that it was a very large discussion with strong opinions on both sides. The only thing I could find is this one:
http://gcc.gnu.org/ml/gcc/2008-01/msg00349.html
but that is only a one-line paragraph on status. You'd have to search further back in time to find the actual thread.
As you can see from this status message, Richard Stallman blocked the introduction of plugins at the time because of concerns that it would allow the GPL to be circumvented.
So, I've been looking through the LLVM pages. I'm new to this compiler-creation business, but it looks to me like the following steps are required to produce a self-compiling GPC on LLVM.
We first have to write a partial-syntax Pascal lexer and parser to create the abstract syntax tree (AST) that is required by the LLVM libraries. We can't use Bison to create the lexer and parser because Bison's output is a C-program. We have to write the lexer and parser from scratch in Pascal.
We compile the partial-syntax lexer and parser with the existing GCC-based GPC. We link our AST structure to the LLVM libraries. The result is a program that takes text input and produces LLVM Intermediate Representation (IR) code. The LLVM system appears to handle optimization and platform-dependent assembly from there.
Having tested the minimal Pascal compiler, we enhance its lexer and parser until it supports all the syntax that is used in its own definition. We are still compiling our Pascal with the GCC version of GPC.
Now we compile the lexer and parser with itself. Our lexer-parser is entirely written in Pascal, but there are C-like parts where we link to the LLVM libraries.
From here, we continue to enhance the compiler in steps. Each step requires defining a new feature without the use of the new feature. Frank says he does not like that, but I'm fine with it.
I will continue looking into the matter, in particular I'd like to learn more about the state of LLVM's optimisers, and hear from some existing users about how well it all works on different platforms. Also, I'd like to know how settled the library interface is, because it's here that we had problems with GCC, if I understand correctly.
So, please correct my mistakes. I may have misunderstood the process entirely.
A few things ..
There is no reason why you couldn't use a parser generator as a base for a compiler that targets LLVM.
As I had mentioned earlier, there are two ways to target LLVM. One is a C++ API for which C bindings exist as well. The other is to generate LLVM pseudo-assembly (text files).
However, if you want your compiler front end to be written in Pascal, I would recommend against bison. The most natural way to write a Pascal compiler would be to write an RD parser by hand. RD parsers have the advantage that it is possible to understand what they do by reading the source code. For each rule in the grammar there is a corresponding Pascal function/procedure. Table driven parsers have their logic in data tables and therefore it is very difficult to figure out what is happening by reading the code, you really need a debugger to get an understanding.
A project looking for newcomers to compiler construction to join the team is probably better off with an RD parser for the reasons mentioned above, but also RD parsers are generally more efficient and produce better error messages.
I would recommend a book by Per Brinch Hansen titled "Brinch Hansen on Pascal Compilers". It handles RD parsing of Pascal and includes source code and commentary for a Pascal-subset compiler.
Now, if you do want to develop a Pascal compiler targeting LLVM, I would suggest to find a new name for the product. Not only would it be confusing to most people if a GNU Pascal didn't target GCC but LLVM instead, but a different name would likely allow you to take advantage of LLVM's popularity when recruiting new developers to the project.
Amongst LLVM developers/users, some have on occasion talked about the possibility of an Algol-family counterpart to Clang. Although thus far nobody has taken this idea anywhere, the nicknames people have given it half-seriously and half-jokingly (Alang, with A for Algol or Ada, or Wlang with W for Wirth and others) would seem to indicate that there is enough love for such an idea to take advantage of.
If I was to start a Pascal compiler project targeting LLVM, I would have "Plang" high on my list of desirable names for the project as it is likely this would help find new friends within the LLVM crowd ;-)
GNU Pascal on the other hand might confuse potential contributors away as they probably think its a GCC front end. You'd always have to explain "Yes, we're GNU but we don't do GCC, we do LLVM". That sort of thing is usually unhelpful.