Frank Heckenbach wrote:
Gale Paeper wrote:
That's interesting. With --no-inline and -O3, the trash procedure in both recursion.pas and nonloc2goto.pas gets tail call optimized with no stack frame at all - just like the C analogue compiled with -O3 (without --no-inline).
Perhaps without `--no-inline' the compiler first inlines the call from the main program (or outer routine), and then it doesn't do tail recursion because it doesn't recurse into the current routine (which is now the one containing the call, e.g. the main program). Just a wild guess, I haven't debugged this further. If that's so, the behaviour is certainly not optimal in this case. Perhaps with these assumptions, you'll be able to reproduce it in C code, which would be easier for reporting it to the backend people.
There seems to be something special in the case of a non-global/static linkage non-nested Pascal procedure that precludes recreating the same "--no-inline and -O3 versus -O3" tail call optimization effect in C.
For several variations of the following C program code:
#include <stdlib.h> #include <stdio.h> void trash(int n); int indirect();
int main(void){ if (indirect() != 0){ printf("Not zero"); } else{ printf("Zero"); } return 0; }
int indirect(){ trash(10000); return rand(); }
void trash(int n){ if (n>0) { trash(n-1); } }
everything I tried ended up with trash being tail call optimized. The options '-finline-functions' and '--no-inline' had no discernable effect.
The only case I could find that prevented tail call optimization in C was with gcc's C nested functions extension, i.e.,
#include <stdlib.h> #include <stdio.h>
int indirect();
int main(void){
void trash(int n){
if (n>0) { trash(n-1); } }
trash(10000); if (indirect() != 0){ printf("Not zero"); } else{ printf("Zero"); } return 0; }
int indirect(){
return rand(); }
However, with or without the '--no-inline' option (in combination with '-O3'), I couldn't get a tail call optimization on the nested C trash function.
As a further data point, I looked at the behavior of the trash procedure in a non-main program unit:
unit TrashTest; interface procedure trash(n: integer);
implementation
procedure trash(n: integer); begin if n>0 then trash(n-1) end; end.
In this configuration, the trash procedure gets tail call optimized by '-O3' without needing the '--no-inline' option.
Changing it so that trash is a nested procedure:
unit TrashTest; interface procedure outer(i: integer);
implementation
procedure outer(i: integer);
procedure trash(n: integer); begin if n>0 then trash(n-1) end; begin trash(i); end; end.
prevented the trash procedure from being tail call optimized even with the '--no-inline' option.
However, the unit variation of:
unit TrashTest; interface procedure outer(i: integer);
implementation
procedure trash(n: integer); begin if n>0 then trash(n-1) end;
procedure outer(i: integer);
begin trash(i); end; end.
end up duplicating the optimization behavior of the main program trash. With just '-O3' no tail call optimization, but with '-O3 --no-inline' you get tail optimization.
In looking at the generated assembly code, it looks like the main program trash procedure isn't being treated by the backend as a nested routine. Both C and Pascal nested routines get a ".0" label suffix which non-nested routines don't get. Since the main program trash doesn't get the ".0" label suffix, it appears that the backend is handling it as a non-nested routine.
So, it appears that for Pascal the behavioral effects depend upon whether or not a non-nested procedure has global or non-global linkage (or in C terminalogy external or static/non-external linkage).
Given that, I tried giving the C non-nested trash static linkage. No effect - it was tail call optimized with '-O3' regardless the usage of '-finline-functions' and '--no-inline'.
To summarize, for Pascal the "--no-inline and -O3 versus -O3" tail call optimization effect seems to depended upon the routine being treated as a non-nested routine with non-global/static linkage; however, for C the effect doesn't exist with non-nested routine with non-global/static linkage.
Frank, any ideas about what might be going on with this and if it is a backend problem how to duplicated the effect in C?
Gale Paeper gpaepeer@empirenet.com