Hi,
GPC currently uses a stand-alone preprocessor (gpcpp) that is derived from the C preprocessor and shares many of its features.
I plan to integrate it into the compiler which would follow gcc's example (not so important, though), maybe make it a little faster (not very much, I guess, since preprocessing in general doesn't take very long), solve some problems (e.g., the `--needed-option' ugliness, mostly on MIPS) and allow for further improvements.
In the same process, I'd like to change some of the more obscure aspects, thereby hopefully making the code much simpler and even more powerful. This will not affect the BPish features of the preprocessor (`{$ifdef}', `{$ifopt}' and `{$i ...}' (include)), and the Pascal standards don't have a preprocessor, anyway, so I think there are no compatibility concerns (except for backward-compatibility to existing GPC versions). I think most the features discussed below are known to few people at all and not used much.
If you want to know about the dirty details of the C preprocessor, please read the info files `cpp' and `cppinternals'. Note that the C preprocessor does a lot of micro-optimizations. I think in Pascal we can get away without them since the preprocessor (in particular macros and complicated includes) are *much* less often used here than in C. (And again, I think preprocessing is quite fast compared to compilation that I doubt how much effect these optimizations have at all.)
The C preprocessor takes some effort to prevent recursive macro calls, so, e.g., in `#define foo foo(x)', the macro foo will not call itself. This might be useful sometimes (e.g., when overriding a previously defined thing), but at the same time it limits the expressiveness of macros. (In the above example recursion would be pointless since it would be inifinite, but using parameters etc., useful cases can be constructed.)
So I suggest the following: Allow recursive macro calls in general, but provide a way to suppress it explicitly, e.g. by prefixing the name with `' (similar to how shells can prevent alias expansion). This could also work outside of macros:
{$define foo bar}
foo { yields `bar' } \foo { yields `foo' } \bar { yields `bar' }
The latter should give a warning (or an error?). This leaves it standard-compliant. Since the stnandard doesn't have macros, each occurrence of `' would be of the latter kind and therefore wrong (according to the standard which doesn't have `' at all).
Another obvious improvement would be to allow compiler directives within macros which is not possible in C. E.g., `{$define foo {$define bar baz}}' (with `--nested-comments') or `{$define foo (*$define bar baz*)}' (with `--mixed-comments') would define a macro that defines another macro when used.
So, e.g., the following macro could compute a factorial (of a constant) at compile time:
{$define FACTORIAL(N) {$if N < 0} {$error Negative argument of FACTORIAL} {$elif N <= 1} 1 {$else} (N * FACTORIAL (N - 1)) {$endif} }
Or an automatic counter (similar to enum types -- those should be preferred when possible, of course, but in some situations it's not so easy):
{$define DEF_REC(X, S, N) {$if N <= 0} {$define X S} {$else} DEF_REC (X, S + 1, N - 1) {$endif} }
{$define COUNTER 0}
{$define DEFCOUNTED(X) {$redefine \COUNTER COUNTER + 1} DEF_REC(X, 0, COUNTER) }
DEFCOUNTED (a) { => 1 } DEFCOUNTED (b) { => 2 } DEFCOUNTED (c) { => 3 }
Two other difficult areas in C (especially because they have different syntax in traditional (K&R) and ANSI C) are token concatenation and stringification.
K&R: foo/**/bar "foo"
This is clearly not suitable for Pascal since in Pascal a comment is clearly equivalent to whitespace, and for the second case (if `foo' is a macro argument, it would yield a string containing the contents of foo, not the 3 letters `f', `o', `o'), in Pascal, strings are clearly literal (whether delimited by "" in C, or '' in Pascal is a side-issue here).
ANSI: foo##bar #foo
These forms look quite artifical to me (and the latter one is also close, though maybe not directly conflicting, to the BP-style char constant syntax).
I think it would be better (also easier to handle) to have some built-in "magic macros" with these effects (which is what in C is typically defined in a header to overcome the K&R vs. ANSI differences). The macros could be given sufficiently long names (e.g., `MACRO_CONCAT' and `MACRO_STRINGIFY') to avoid most conflicts or, if deemed necessary, be activated only within macro bodies (outside of them they're not very interesting, anyway). Since in standard and BP modes macros are disabled at all, they have no effect there, either.
Similarly, one could have, e.g., `MACRO_EVAL' within a macro definition to substitute
{$define FOO 1} {$define BAR FOO} {$define BAZ MACRO_EVAL(FOO)} {$redefine FOO 2}
BAR { => 2 } BAZ { => 1 }
This would, e.g., simplify the automatic counter from above:
{$define COUNTER 0}
{$define DEFCOUNTED(X) {$redefine \COUNTER COUNTER + 1} {$define X MACRO_EVAL(COUNTER)} }
Again, I think this behaviour will actuall be simpler to implement than the C compatible behaviour (and much simpler than the current implementation which is overly complex for questionable optimizations (and consists of rather old code) ...
Then, I don't like the `{$include <...>}' syntax (mostly because `<' and `>' have very unusual meanings there). In C, there might be a point in distinguishing between system and application headers (though I haven't seem many cases very it was really important, even in C), whereas in Pascal you don't have system headers (and typically few, if any) application headers. So one include mechanism should be enough, namely `{$include STRING}' (where STRING can be any regular string constant). (And, for compatibility's sake, the BP form `{$I foo}' which appends `.pas' automatically if necessary.)
Going a step further, I'm also thinking about dropping the C syntax for compiler directives (`#foo'), in favour of the (also already existing) BP-compatible syntax (`{$foo}'). I suppose there will be some objections, but I thought I'd just ask, anyway ...
The special meaning of `'-newline also seems to be needed for `#' directives (since `{$}' directives can be multi-line naturally), so this could also be dropped then.
And, finally, perhaps also dropping C operators in conditional expressions (`{$if foo && bar}' -> `{$if foo and bar}') ...
Frank
Frank Heckenbach wrote:
GPC currently uses a stand-alone preprocessor (gpcpp) that is derived from the C preprocessor and shares many of its features.
I plan to integrate it into the compiler which would follow gcc's example (not so important, though), maybe make it a little faster (not very much, I guess, since preprocessing in general doesn't take very long), solve some problems (e.g., the `--needed-option' ugliness, mostly on MIPS) and allow for further improvements.
... snip ...
Since preprocessing has nothing to do with Pascal per se, I would recommend keeping any such firmly separated. I am not familiar with the operation of m4, but between that and what you already evidently have, I suspect any complex games could be performed.
You can always go further by integrating things with the symbol table, etc., but is it worth it? The resultant input language is neither fish nor fowl. My own view is that there are already too many exceptions and extensions in gpc, and once something is added it is hard to remove, because there is always a group that uses everything.
CBFalconer wrote:
Frank Heckenbach wrote:
GPC currently uses a stand-alone preprocessor (gpcpp) that is derived from the C preprocessor and shares many of its features.
I plan to integrate it into the compiler which would follow gcc's example (not so important, though), maybe make it a little faster (not very much, I guess, since preprocessing in general doesn't take very long), solve some problems (e.g., the `--needed-option' ugliness, mostly on MIPS) and allow for further improvements.
... snip ...
Since preprocessing has nothing to do with Pascal per se, I would recommend keeping any such firmly separated. I am not familiar with the operation of m4, but between that and what you already evidently have, I suspect any complex games could be performed.
I've used m4 sometimes. My main problem with it is that its philosophy seems to be, do everything that I don't explicitly tell you not to do; i.e. I often ended up quoting things, often multiple times, to *prevent* some unwanted expansions (and I wasn't always sure how many levels of quoting I needed in some situations until I experimented ;-).
Also, it's purely text based (not token oriented), so the following example:
define(Count, `for i := 1 to $1 do $2 end') begin Count(10, WriteLn (i))end;
would yield:
begin for i := 1 to 10 do WriteLn (i) endend; ^space missing
That's a somewhat silly example, of course, but it shows that m4 doesn't really fit the Pascal (or C) lexing.
You can always go further by integrating things with the symbol table, etc., but is it worth it? The resultant input language is neither fish nor fowl.
I don't think so. It should still be possible to see the preprocessed source (with `-E'), and of course, no-one is forced to use the preprocessor.
My own view is that there are already too many exceptions and extensions in gpc, and once something is added it is hard to remove, because there is always a group that uses everything.
That's already the case with the current preprocessor. Actually, I'm trying to make it simpler, not more complex.
Frank
Frank Heckenbach wrote:
CBFalconer wrote:
Frank Heckenbach wrote:
GPC currently uses a stand-alone preprocessor (gpcpp) that is derived from the C preprocessor and shares many of its features.
I plan to integrate it into the compiler which would follow gcc's example (not so important, though), maybe make it a little faster (not very much, I guess, since preprocessing in general doesn't take very long), solve some problems (e.g., the `--needed-option' ugliness, mostly on MIPS) and allow for further improvements.
... snip ...
You can always go further by integrating things with the symbol table, etc., but is it worth it? The resultant input language is neither fish nor fowl.
I don't think so. It should still be possible to see the preprocessed source (with `-E'), and of course, no-one is forced to use the preprocessor.
My own view is that there are already too many exceptions and extensions in gpc, and once something is added it is hard to remove, because there is always a group that uses everything.
That's already the case with the current preprocessor. Actually, I'm trying to make it simpler, not more complex.
Don't misunderstand, I wasn't arguing for using m4, just pointing out that it exists. Actually, on further idle thought (none of this is especially well thought out) maybe the integrated preprocessor is the way to implement the Delphi constructs. The philosophy of the triple system, GNU, 10206, 7185 seems to be firmly established. By banishing most Delphi and TP extensions to such a preprocessor it might be possible to keep the main thing cleaner.
CBFalconer wrote:
Don't misunderstand, I wasn't arguing for using m4, just pointing out that it exists. Actually, on further idle thought (none of this is especially well thought out) maybe the integrated preprocessor is the way to implement the Delphi constructs. The philosophy of the triple system, GNU, 10206, 7185 seems to be firmly established. By banishing most Delphi and TP extensions to such a preprocessor it might be possible to keep the main thing cleaner.
Might be nice if it worked, but I suppose for most features it won't since they require more syntactic information. E.g., one difference between BP objects and Delphi classes is that the latter are implicitly references. They way to implement them on top of objects (if no other problems arise) would be via pointers, adding `^' automatically. But there's no way to find the places where to add them with typical "preprocessor" information. If you think of complex expressions, involving function parameters/results, record/object fields, the preprocessor would have to know about all their types etc.
I think that's easier to do within the parser. And I don't think it won't be a big problem (apart from the effort involved to implement it, of course), and, of course, we can reject it based on dialect options.
For other features, it might be possible with the preprocessor. E.g., it shouldn't be too hard to construct a prreprocessor to ignore `()'. But then again, it would ignore them everywhere, so also `MyVariable ()' would work which is probably a bad idea (even worse than `MyFunction ()').
Another point is that preprocessor features don't work across units/modules (in the current separate implementation that would be almost impossible to achieve, and it's also thought cleaner not to do so), while most Delphi features would have to ...
Frank