Well, it's an option. I could write some simple programs in Pascal and see what assembly code is being generated.
But I think it should be a formal document with this step. I mean, when somebody implements a compiler, he/she must follow several steps: lex, parsing, generation of symbol table and object code generation. Well, I search for a scheme with the correspondencies between the structures being parsed and the code generated.
E.g.:
program: program id other body; --> initialize memory
body: begin instructions end ';' --> generate instructions 'enter', 'leave'
...
I'm not interested in sources of GPC, because GPC is designed as a front-end, and I search for the back-end...
It is still not clear what you really want. I will assume that you want to understand how a compiler works (maybe write a simple one). Gcc is an optimizing compiler, working in many passes and there is really NO simple scheme -- genereted code depend in highly "non additive" way on the source, in particular at last stages the generated code is rearranged to allow more instructions to execute in parallel and also some sequences of instructions are replaced by better ones. As long as I know you can disable most of the optimizations (using -O0 flag) but some optimizations still are performed. In fact turning optimizations on/off changes which procedures are used in some stages to generate code so the translation scheme really depend on exact switches you gave to gcc. In theory one should precisely describe the expected effect of various transformations and only then begin to code the compiler. In practice tiny little details matter most, and once you spelled out exactly every little detail, then you realize that you really have a computer program. It it wastefull to code the same computation twice (and there is little hope that two different programs will perforom the same computation anyway), so the only formal description what gcc is doing is the source code. If you want informal overview of how gcc works besides gcc docomentation you may look at: http://cobolforgcc.sourceforge.net/cobol_14.html where you can find probably the best (however incomplete) descripition of interface between front end and the back end. If you want to know the exact rules used to produce i386 instructions from gcc-internal data you may look at gcc/config/i386/i386.md (but that file is cryptic, I will not dare to modify it).
I think that one general remark is in place: the compiler perform translation in many phases, even if each phase is very simple (not the case of gcc) the final effect may appear complex -- in other words if you try to describe the process as a single step then the description becames very complex.
If you want a very simplified decscription of the whole process here it goes:
fist stage -- build data structures in compiler: collect type, variable and procedure declarations, store procedure bodies as trees representing sequences of instructions, loops, conditionals, procedure calls and assignments the main program is treated as a body of fictional procedure
the first stage is almost independent of the target.
second stage -- allocate variables: in Pascal basically you compute how much space the variables will take
in the second stage you have to know how big basic types are, and possibly alignment rules -- on i386 you may use no alignment but you get better performance (and compatiblity with other compilers) if you allocate 2 byte variable on even adresses and 4 byte variables on adresses divisible by 4 (on newest processors you should also align 8 byte variables)
now we can generate code: the bulk of code are expressions, they are trees with simple (binary or unary) operators or function calls in the nodes. We may assume that simple operators work on integers and that operation correspod to a single machine instruction -- other operators are replaced by function calls. One big work when translating expressions is register allocation. In simple scheme you allocate a temporary variable for each tree node and just fetch value from memory before any operation and store the result after the operation. So x := x + y; becomes movl x,%eax movl y,%ebx addl %ebx,%eax movl %eax,x if x and y are global variables. For local variables you use its offset inside a stack frame like: movl -8(%ebp),%eax for boolean expressions like x := y > z; the code looks: movl y,%eax movl x,%ebx cmpl %eax,%ebx jg l1 movl $0,%eax jmp l2 l1: movl $1,%eax l2: movl %eax, x
for conditional instruction if cond then I1 else I2; the code looks: Computation of cond movl cond, %eax # cond is a temporary to store value of condition cmp $0, %eax jz l1 Translation of I1 jmp l2 l1: Translation of I2 l2 Capital letter above means you need to expand corresponding fragment
procedure call: foo(expr1, expr2, expr3) -- all parameter by value Compute expr3 push expr3 Compute expr2 push expr2 Compute expr1 push expr1 call foo
procedure body: Prolog Expand body instructions Epilog
Prolog and Epilog depend on exact calling convention, with the scheme above only ebx and esp need saving so simple enter and leave are enough, but gcc wants to have more registers preserved, so one have to push them on the stack in the Prolog and pop them in the Epilog.
Pointer dereference: x := y^; movl y, %eax movl (%eax),%eax movl %eax, x
Array and othere complex data is effectively represented by pointers (adresses) -- given address of the whole object compiler computes address of a component --- you may reduce the whole Pascal to simple fragments like above (well they are similar but formally you need more such fragments) and adress computations.
If you try to understand compilers you probably look at something simpler then whole Pascal -- you may find on the net examples of compilers for some subsets of Pascal. The scheme I presented above can handle full Pascal, but given in full detail would be long, boring and (I belive) not easier to understand due to lot of details --- and the resulting compiler would generate really lousy code (IMHO gcc gives you 10-20 times faster code)