I've built GPC 3.4.6 and 4.3.6 on Debian Stretch (64bit) Running benchmarks on a Perlin Noise routine I get
3.4.6 5.6 secs 4.3.6 11.0 secs !!
The difference seems to be mainly down to a failure of 4.3.6 to inline. Call functions themselves may not take much time but I believe the real problem is that without the subroutines inlined, the main Noise function is no longer a leaf function, and therefore under register pressure.
â¦.....
a := Lerp (sx, u10, v10); b := Lerp (sx, u11, v11); d := Lerp (sy, a, b);
Noise3 := Lerp(sz, c, d); End;
This code produces with 3.4.6
{ Hard to know where it starts without line numbers ! } addsd %xmm9, %xmm7 addsd %xmm3, %xmm2 subsd %xmm0, %xmm1 subsd %xmm7, %xmm2 mulsd %xmm14, %xmm1 mulsd %xmm14, %xmm2 addsd %xmm0, %xmm1 addsd %xmm7, %xmm2 subsd %xmm1, %xmm2 mulsd %xmm2, %xmm12 movsd %xmm12, %xmm0 addsd %xmm1, %xmm0 ret
which is pretty lean & mean.
However, with 4.3.6 I get
movsd 112(%rsp), %xmm2 movsd 104(%rsp), %xmm1 movsd 48(%rsp), %xmm0 call _p__M0_S5_Lerp movsd 128(%rsp), %xmm2 movsd %xmm0, 152(%rsp) movsd 120(%rsp), %xmm1 movsd 48(%rsp), %xmm0 call _p__M0_S5_Lerp movsd 152(%rsp), %xmm1 movapd %xmm0, %xmm2 movsd 56(%rsp), %xmm0 call _p__M0_S5_Lerp movsd 40(%rsp), %xmm1 movapd %xmm0, %xmm2 movsd 64(%rsp), %xmm0 call _p__M0_S5_Lerp addq $240, %rsp popq %rbx popq %rbp popq %r12 popq %r13 popq %r14 popq %r15 ret
( Stack spills tend to be expensive on [my] AMD processor as the level 0/1 cache isn't that fast.
Maybe I have broken my build of 4.3.6 Can anyone else conform the status of inlining on linux x86 with 4.x.x compilers? The simple example from the info file
program InlineDemo;
function Max (x, y: Integer): Integer; attribute (inline); begin if x > y then Max := x else Max := y end;
begin WriteLn (Max (42, 17), ' ', Max (-4, -2)) end.
Also does not work for me with 4.3.6. It still produces a call instruction. call _p__M0_S0_Max
I recall also that inlining did not work with the official 4.1 Debian package. I was thinking of reporting this as a (Debian) bug a while back, but GPC was then removed from the archive, which made that moot.
Going forward, I'm wondering which gcc version to base my builds on. 4.3.6 supports potentially a few more architectures, ARMel PowerPC SH4
and supports the -m32 switch, but a 100% slowdown on the CPU intensive stuff I use the compiler for is too much a penalty for me.
From further tinkering around, I notice that 3.4.6 often inlines even
when not asked to do so, whereas 4.3.6 very rarely if ever inlines. Neither compiler seems to obey the inline attribute!
Anyone any thoughts on this? Hoping of course that its an easy to fix typo type bug...
Regards, Peter B