Hi,
GPC currently uses a stand-alone preprocessor (gpcpp) that is derived from the C preprocessor and shares many of its features.
I plan to integrate it into the compiler which would follow gcc's example (not so important, though), maybe make it a little faster (not very much, I guess, since preprocessing in general doesn't take very long), solve some problems (e.g., the `--needed-option' ugliness, mostly on MIPS) and allow for further improvements.
In the same process, I'd like to change some of the more obscure aspects, thereby hopefully making the code much simpler and even more powerful. This will not affect the BPish features of the preprocessor (`{$ifdef}', `{$ifopt}' and `{$i ...}' (include)), and the Pascal standards don't have a preprocessor, anyway, so I think there are no compatibility concerns (except for backward-compatibility to existing GPC versions). I think most the features discussed below are known to few people at all and not used much.
If you want to know about the dirty details of the C preprocessor, please read the info files `cpp' and `cppinternals'. Note that the C preprocessor does a lot of micro-optimizations. I think in Pascal we can get away without them since the preprocessor (in particular macros and complicated includes) are *much* less often used here than in C. (And again, I think preprocessing is quite fast compared to compilation that I doubt how much effect these optimizations have at all.)
The C preprocessor takes some effort to prevent recursive macro calls, so, e.g., in `#define foo foo(x)', the macro foo will not call itself. This might be useful sometimes (e.g., when overriding a previously defined thing), but at the same time it limits the expressiveness of macros. (In the above example recursion would be pointless since it would be inifinite, but using parameters etc., useful cases can be constructed.)
So I suggest the following: Allow recursive macro calls in general, but provide a way to suppress it explicitly, e.g. by prefixing the name with `' (similar to how shells can prevent alias expansion). This could also work outside of macros:
{$define foo bar}
foo { yields `bar' } \foo { yields `foo' } \bar { yields `bar' }
The latter should give a warning (or an error?). This leaves it standard-compliant. Since the stnandard doesn't have macros, each occurrence of `' would be of the latter kind and therefore wrong (according to the standard which doesn't have `' at all).
Another obvious improvement would be to allow compiler directives within macros which is not possible in C. E.g., `{$define foo {$define bar baz}}' (with `--nested-comments') or `{$define foo (*$define bar baz*)}' (with `--mixed-comments') would define a macro that defines another macro when used.
So, e.g., the following macro could compute a factorial (of a constant) at compile time:
{$define FACTORIAL(N) {$if N < 0} {$error Negative argument of FACTORIAL} {$elif N <= 1} 1 {$else} (N * FACTORIAL (N - 1)) {$endif} }
Or an automatic counter (similar to enum types -- those should be preferred when possible, of course, but in some situations it's not so easy):
{$define DEF_REC(X, S, N) {$if N <= 0} {$define X S} {$else} DEF_REC (X, S + 1, N - 1) {$endif} }
{$define COUNTER 0}
{$define DEFCOUNTED(X) {$redefine \COUNTER COUNTER + 1} DEF_REC(X, 0, COUNTER) }
DEFCOUNTED (a) { => 1 } DEFCOUNTED (b) { => 2 } DEFCOUNTED (c) { => 3 }
Two other difficult areas in C (especially because they have different syntax in traditional (K&R) and ANSI C) are token concatenation and stringification.
K&R: foo/**/bar "foo"
This is clearly not suitable for Pascal since in Pascal a comment is clearly equivalent to whitespace, and for the second case (if `foo' is a macro argument, it would yield a string containing the contents of foo, not the 3 letters `f', `o', `o'), in Pascal, strings are clearly literal (whether delimited by "" in C, or '' in Pascal is a side-issue here).
ANSI: foo##bar #foo
These forms look quite artifical to me (and the latter one is also close, though maybe not directly conflicting, to the BP-style char constant syntax).
I think it would be better (also easier to handle) to have some built-in "magic macros" with these effects (which is what in C is typically defined in a header to overcome the K&R vs. ANSI differences). The macros could be given sufficiently long names (e.g., `MACRO_CONCAT' and `MACRO_STRINGIFY') to avoid most conflicts or, if deemed necessary, be activated only within macro bodies (outside of them they're not very interesting, anyway). Since in standard and BP modes macros are disabled at all, they have no effect there, either.
Similarly, one could have, e.g., `MACRO_EVAL' within a macro definition to substitute
{$define FOO 1} {$define BAR FOO} {$define BAZ MACRO_EVAL(FOO)} {$redefine FOO 2}
BAR { => 2 } BAZ { => 1 }
This would, e.g., simplify the automatic counter from above:
{$define COUNTER 0}
{$define DEFCOUNTED(X) {$redefine \COUNTER COUNTER + 1} {$define X MACRO_EVAL(COUNTER)} }
Again, I think this behaviour will actuall be simpler to implement than the C compatible behaviour (and much simpler than the current implementation which is overly complex for questionable optimizations (and consists of rather old code) ...
Then, I don't like the `{$include <...>}' syntax (mostly because `<' and `>' have very unusual meanings there). In C, there might be a point in distinguishing between system and application headers (though I haven't seem many cases very it was really important, even in C), whereas in Pascal you don't have system headers (and typically few, if any) application headers. So one include mechanism should be enough, namely `{$include STRING}' (where STRING can be any regular string constant). (And, for compatibility's sake, the BP form `{$I foo}' which appends `.pas' automatically if necessary.)
Going a step further, I'm also thinking about dropping the C syntax for compiler directives (`#foo'), in favour of the (also already existing) BP-compatible syntax (`{$foo}'). I suppose there will be some objections, but I thought I'd just ask, anyway ...
The special meaning of `'-newline also seems to be needed for `#' directives (since `{$}' directives can be multi-line naturally), so this could also be dropped then.
And, finally, perhaps also dropping C operators in conditional expressions (`{$if foo && bar}' -> `{$if foo and bar}') ...
Frank