I've built GPC 3.4.6 and 4.3.6 on Debian Stretch (64bit)
Running benchmarks on a Perlin Noise routine I get
3.4.6 5.6 secs
4.3.6 11.0 secs !!
The difference seems to be mainly down to a failure of 4.3.6 to inline.
Call functions themselves may not take much time but I believe the real
problem is that without the subroutines inlined, the main Noise function
is no longer a leaf function, and therefore under register pressure.
â¦.....
a := Lerp (sx, u10, v10);
b := Lerp (sx, u11, v11);
d := Lerp (sy, a, b);
Noise3 := Lerp(sz, c, d);
End;
This code produces with 3.4.6
{ Hard to know where it starts without line numbers ! }
addsd %xmm9, %xmm7
addsd %xmm3, %xmm2
subsd %xmm0, %xmm1
subsd %xmm7, %xmm2
mulsd %xmm14, %xmm1
mulsd %xmm14, %xmm2
addsd %xmm0, %xmm1
addsd %xmm7, %xmm2
subsd %xmm1, %xmm2
mulsd %xmm2, %xmm12
movsd %xmm12, %xmm0
addsd %xmm1, %xmm0
ret
which is pretty lean & mean.
However, with 4.3.6 I get
movsd 112(%rsp), %xmm2
movsd 104(%rsp), %xmm1
movsd 48(%rsp), %xmm0
call _p__M0_S5_Lerp
movsd 128(%rsp), %xmm2
movsd %xmm0, 152(%rsp)
movsd 120(%rsp), %xmm1
movsd 48(%rsp), %xmm0
call _p__M0_S5_Lerp
movsd 152(%rsp), %xmm1
movapd %xmm0, %xmm2
movsd 56(%rsp), %xmm0
call _p__M0_S5_Lerp
movsd 40(%rsp), %xmm1
movapd %xmm0, %xmm2
movsd 64(%rsp), %xmm0
call _p__M0_S5_Lerp
addq $240, %rsp
popq %rbx
popq %rbp
popq %r12
popq %r13
popq %r14
popq %r15
ret
( Stack spills tend to be expensive on [my] AMD processor as the level
0/1 cache isn't that fast.
Maybe I have broken my build of 4.3.6 Can anyone else conform the
status of inlining on linux x86 with 4.x.x compilers? The simple
example from the info file
program InlineDemo;
function Max (x, y: Integer): Integer; attribute (inline);
begin
if x > y then
Max := x
else
Max := y
end;
begin
WriteLn (Max (42, 17), ' ', Max (-4, -2))
end.
Also does not work for me with 4.3.6. It still produces a call instruction.
call _p__M0_S0_Max
I recall also that inlining did not work with the official 4.1 Debian
package. I was thinking of reporting this as a (Debian) bug a while
back, but GPC was then removed from the archive, which made that moot.
Going forward, I'm wondering which gcc version to base my builds on.
4.3.6 supports potentially a few more architectures,
ARMel
PowerPC
SH4
and supports the -m32 switch, but a 100% slowdown on the CPU intensive
stuff I use the compiler for is too much a penalty for me.
>From further tinkering around, I notice that 3.4.6 often inlines even
when not asked to do so, whereas 4.3.6 very rarely if ever inlines.
Neither compiler seems to obey the inline attribute!
Anyone any thoughts on this?
Hoping of course that its an easy to fix typo type bug...
Regards,
Peter B