I've built GPC 3.4.6 and 4.3.6 on Debian Stretch (64bit) Running benchmarks on a Perlin Noise routine I get
3.4.6 5.6 secs 4.3.6 11.0 secs !!
The difference seems to be mainly down to a failure of 4.3.6 to inline. Call functions themselves may not take much time but I believe the real problem is that without the subroutines inlined, the main Noise function is no longer a leaf function, and therefore under register pressure.
â¦.....
a := Lerp (sx, u10, v10); b := Lerp (sx, u11, v11); d := Lerp (sy, a, b);
Noise3 := Lerp(sz, c, d); End;
This code produces with 3.4.6
{ Hard to know where it starts without line numbers ! } addsd %xmm9, %xmm7 addsd %xmm3, %xmm2 subsd %xmm0, %xmm1 subsd %xmm7, %xmm2 mulsd %xmm14, %xmm1 mulsd %xmm14, %xmm2 addsd %xmm0, %xmm1 addsd %xmm7, %xmm2 subsd %xmm1, %xmm2 mulsd %xmm2, %xmm12 movsd %xmm12, %xmm0 addsd %xmm1, %xmm0 ret
which is pretty lean & mean.
However, with 4.3.6 I get
movsd 112(%rsp), %xmm2 movsd 104(%rsp), %xmm1 movsd 48(%rsp), %xmm0 call _p__M0_S5_Lerp movsd 128(%rsp), %xmm2 movsd %xmm0, 152(%rsp) movsd 120(%rsp), %xmm1 movsd 48(%rsp), %xmm0 call _p__M0_S5_Lerp movsd 152(%rsp), %xmm1 movapd %xmm0, %xmm2 movsd 56(%rsp), %xmm0 call _p__M0_S5_Lerp movsd 40(%rsp), %xmm1 movapd %xmm0, %xmm2 movsd 64(%rsp), %xmm0 call _p__M0_S5_Lerp addq $240, %rsp popq %rbx popq %rbp popq %r12 popq %r13 popq %r14 popq %r15 ret
( Stack spills tend to be expensive on [my] AMD processor as the level 0/1 cache isn't that fast.
Maybe I have broken my build of 4.3.6 Can anyone else conform the status of inlining on linux x86 with 4.x.x compilers? The simple example from the info file
program InlineDemo;
function Max (x, y: Integer): Integer; attribute (inline); begin if x > y then Max := x else Max := y end;
begin WriteLn (Max (42, 17), ' ', Max (-4, -2)) end.
Also does not work for me with 4.3.6. It still produces a call instruction. call _p__M0_S0_Max
I recall also that inlining did not work with the official 4.1 Debian package. I was thinking of reporting this as a (Debian) bug a while back, but GPC was then removed from the archive, which made that moot.
Going forward, I'm wondering which gcc version to base my builds on. 4.3.6 supports potentially a few more architectures, ARMel PowerPC SH4
and supports the -m32 switch, but a 100% slowdown on the CPU intensive stuff I use the compiler for is too much a penalty for me.
From further tinkering around, I notice that 3.4.6 often inlines even
when not asked to do so, whereas 4.3.6 very rarely if ever inlines. Neither compiler seems to obey the inline attribute!
Anyone any thoughts on this? Hoping of course that its an easy to fix typo type bug...
Regards, Peter B
On 10 Jan 2017 at 15:17, Peter wrote:
[...]
program InlineDemo;
function Max (x, y: Integer): Integer; attribute (inline); begin if x > y then Max := x else Max := y end;
begin WriteLn (Max (42, 17), ' ', Max (-4, -2)) end.
Also does not work for me with 4.3.6. It still produces a call instruction.
Works with (32-bit) 4.1.2 (gpc version 20070904, based on gcc-4.1.2) running on 64-bit Linux Mint 17.
Going forward, I'm wondering which gcc version to base my builds on. 4.3.6 supports potentially a few more architectures, ARMel PowerPC SH4
I have stuck with 4.1.2 (32-bit) for years.
Best regards, The Chief -------- Prof. Abimbola A. Olowofoyeku (The African Chief) web: http://www.greatchief.plus.com/
On 10/01/17 16:05, Prof Abimbola Olowofoyeku wrote:
snip
Works with (32-bit) 4.1.2 (gpc version 20070904, based on gcc-4.1.2) running on 64-bit Linux Mint 17.
snip
Thanks, that's interesting. I tried 32-bit from 4.3.6 via the -m32 switch, but still not working here.
I would be very interested to see results from any 20070904 64-bit compiler based on any gcc-4.x.x Hopefully someone out there could run this test?
gpc -S -O3 InlineDemo.pas grep call.*Max InlineDemo.s
Le 11/01/2017 à 13:47, Peter a écrit :
On 10/01/17 16:05, Prof Abimbola Olowofoyeku wrote:
snip
Works with (32-bit) 4.1.2 (gpc version 20070904, based on gcc-4.1.2) running on 64-bit Linux Mint 17.
snip
Thanks, that's interesting. I tried 32-bit from 4.3.6 via the -m32 switch, but still not working here.
I would be very interested to see results from any 20070904 64-bit compiler based on any gcc-4.x.x Hopefully someone out there could run this test?
gpc -S -O3 InlineDemo.pas grep call.*Max InlineDemo.s
I have succesfully compiled gpc-4.3.5 from Waldeck's git programs in year 2013 on an old fedora 6 system both for 32 and 64 bits. The checks are given in attached files.
Running your suggested test gived the attached inlinedemo.s and grep gives call _p_M0_S0_Max call _p_M0_S0_Max
In fact for my work I still stick with gcc-3.4.6 based gpc because I see no advantage in the gcc-4.3.5 based gpc, with contains still a (very few) more errors in the tests.
Hope this helps
Maurice
Le 11/01/2017 à 13:47, Peter a écrit :
On 10/01/17 16:05, Prof Abimbola Olowofoyeku wrote:
snip
Works with (32-bit) 4.1.2 (gpc version 20070904, based on gcc-4.1.2) running on 64-bit Linux Mint 17.
snip
Thanks, that's interesting. I tried 32-bit from 4.3.6 via the -m32 switch, but still not working here.
I would be very interested to see results from any 20070904 64-bit compiler based on any gcc-4.x.x Hopefully someone out there could run this test?
gpc -S -O3 InlineDemo.pas grep call.*Max InlineDemo.s
OOPS. I have used the inlinedemo.pas contained in the docdemos directory of gpc, instead of the InlineDemo.pas you provided (notice the different capitalization). The only difference is the {$R-} directive contained in your version. But this changes nothing, except the first line in InlineDemo.s, as you can see in the attached file.
Hope this helps
Maurice
That does help, thanks. Max being called shows that the expected inlining is not working here either, so it eliminates my build, or the 'experimental' GCC 4.3.6, as being the culprit.
Regards, Peter
peter wrote:
I've built GPC 3.4.6 and 4.3.6 on Debian Stretch (64bit) Running benchmarks on a Perlin Noise routine I get
3.4.6 5.6 secs 4.3.6 11.0 secs !!
The difference seems to be mainly down to a failure of 4.3.6 to inline.
<snip>
Maybe I have broken my build of 4.3.6 Can anyone else conform the status of inlining on linux x86 with 4.x.x compilers? The simple example from the info file
program InlineDemo;
function Max (x, y: Integer): Integer; attribute (inline); begin if x > y then Max := x else Max := y end;
begin WriteLn (Max (42, 17), ' ', Max (-4, -2)) end.
Also does not work for me with 4.3.6. It still produces a call instruction. call _p__M0_S0_Max
From further tinkering around, I notice that 3.4.6 often inlines even
when not asked to do so, whereas 4.3.6 very rarely if ever inlines. Neither compiler seems to obey the inline attribute!
Anyone any thoughts on this? Hoping of course that its an easy to fix typo type bug...
I will look into this. One thing to remember is that gcc backend treat 'inline' only as a hint and apparently with newer backend gcc developers claim that compiler knows better then the programmer if inlining is good. There is a way to request inlining regardless of compiler opinion about benefits, but this still may fail because compiler can inline only when certain problematic constructs are avoided. For example nonlocal gotos disable inlining, even though for such routines inlining could be very profitable by converting nonlocal jumps to local ones. Also I am affraid that gpc currently does not implement needed directive.
If you use subranges or arrays you can try to disable range checking. Gcc backend is quite good at removing redundant checks, but due to checking code routines may appear bigger and inliner may decide that inlining is too costly.
On 10/01/17 17:17, Waldek Hebisch wrote:
peter wrote:
I've built GPC 3.4.6 and 4.3.6 on Debian Stretch (64bit) Running benchmarks on a Perlin Noise routine I get
3.4.6 5.6 secs 4.3.6 11.0 secs !!
The difference seems to be mainly down to a failure of 4.3.6 to inline.
<snip> > Maybe I have broken my build of 4.3.6 Can anyone else conform the > status of inlining on linux x86 with 4.x.x compilers? The simple > example from the info file > > program InlineDemo; > > function Max (x, y: Integer): Integer; attribute (inline); > begin > if x > y then > Max := x > else > Max := y > end; > > begin > WriteLn (Max (42, 17), ' ', Max (-4, -2)) > end. > > > Also does not work for me with 4.3.6. It still produces a call instruction. > call _p__M0_S0_Max > > > >From further tinkering around, I notice that 3.4.6 often inlines even > when not asked to do so, whereas 4.3.6 very rarely if ever inlines. > Neither compiler seems to obey the inline attribute! > > Anyone any thoughts on this? > Hoping of course that its an easy to fix typo type bug...
I will look into this. One thing to remember is that gcc backend treat 'inline' only as a hint and apparently with newer backend gcc developers claim that compiler knows better then the programmer if inlining is good. There is a way to request inlining regardless of compiler opinion about benefits, but this still may fail because compiler can inline only when certain problematic constructs are avoided. For example nonlocal gotos disable inlining, even though for such routines inlining could be very profitable by converting nonlocal jumps to local ones. Also I am affraid that gpc currently does not implement needed directive.
If you use subranges or arrays you can try to disable range checking. Gcc backend is quite good at removing redundant checks, but due to checking code routines may appear bigger and inliner may decide that inlining is too costly.
Hi,
I had disabled range checks in the Noise routine, and disabling them in the demo does'nt fix inlining. (Compiling with -O3)
I realise now that the patches only go up to 4.3.5 and also the bottom of the assembler is .ident "GCC: pkgversion_string ???experimental 20110215" whereas for 3.4.6 I get .ident "GCC: (GNU) 3.4.6"
Still, I pretty sure this was a problem with 4.1 (64bit) too.
Regards, Peter
So somebody has compiled GPC based on GCC 4.3? This is good (latest I had previously seen was 4.1). Are patches available for any later versions of GCC?
--------------------------| John L. Ries | Salford Systems | Phone: (619)543-8880 x107 | or (435)867-8885 | --------------------------|
On Tue, 10 Jan 2017, Peter wrote:
I've built GPC 3.4.6 and 4.3.6 on Debian Stretch (64bit) Running benchmarks on a Perlin Noise routine I get
3.4.6 5.6 secs 4.3.6 11.0 secs !!
The difference seems to be mainly down to a failure of 4.3.6 to inline. Call functions themselves may not take much time but I believe the real problem is that without the subroutines inlined, the main Noise function is no longer a leaf function, and therefore under register pressure.
….....
a := Lerp (sx, u10, v10); b := Lerp (sx, u11, v11); d := Lerp (sy, a, b); Noise3 := Lerp(sz, c, d);
End;
This code produces with 3.4.6
{ Hard to know where it starts without line numbers ! } addsd %xmm9, %xmm7 addsd %xmm3, %xmm2 subsd %xmm0, %xmm1 subsd %xmm7, %xmm2 mulsd %xmm14, %xmm1 mulsd %xmm14, %xmm2 addsd %xmm0, %xmm1 addsd %xmm7, %xmm2 subsd %xmm1, %xmm2 mulsd %xmm2, %xmm12 movsd %xmm12, %xmm0 addsd %xmm1, %xmm0 ret
which is pretty lean & mean.
However, with 4.3.6 I get
movsd 112(%rsp), %xmm2 movsd 104(%rsp), %xmm1 movsd 48(%rsp), %xmm0 call _p__M0_S5_Lerp movsd 128(%rsp), %xmm2 movsd %xmm0, 152(%rsp) movsd 120(%rsp), %xmm1 movsd 48(%rsp), %xmm0 call _p__M0_S5_Lerp movsd 152(%rsp), %xmm1 movapd %xmm0, %xmm2 movsd 56(%rsp), %xmm0 call _p__M0_S5_Lerp movsd 40(%rsp), %xmm1 movapd %xmm0, %xmm2 movsd 64(%rsp), %xmm0 call _p__M0_S5_Lerp addq $240, %rsp popq %rbx popq %rbp popq %r12 popq %r13 popq %r14 popq %r15 ret
( Stack spills tend to be expensive on [my] AMD processor as the level 0/1 cache isn't that fast.
Maybe I have broken my build of 4.3.6 Can anyone else conform the status of inlining on linux x86 with 4.x.x compilers? The simple example from the info file
program InlineDemo;
function Max (x, y: Integer): Integer; attribute (inline); begin if x > y then Max := x else Max := y end;
begin WriteLn (Max (42, 17), ' ', Max (-4, -2)) end.
Also does not work for me with 4.3.6. It still produces a call instruction. call _p__M0_S0_Max
I recall also that inlining did not work with the official 4.1 Debian package. I was thinking of reporting this as a (Debian) bug a while back, but GPC was then removed from the archive, which made that moot.
Going forward, I'm wondering which gcc version to base my builds on. 4.3.6 supports potentially a few more architectures, ARMel PowerPC SH4
and supports the -m32 switch, but a 100% slowdown on the CPU intensive stuff I use the compiler for is too much a penalty for me.
From further tinkering around, I notice that 3.4.6 often inlines even when not asked to do so, whereas 4.3.6 very rarely if ever inlines. Neither compiler seems to obey the inline attribute!
Anyone any thoughts on this? Hoping of course that its an easy to fix typo type bug...
Regards, Peter B
Gpc mailing list Gpc@gnu.de https://www.g-n-u.de/mailman/listinfo/gpc
On 10/01/17 18:11, John L. Ries wrote:
So somebody has compiled GPC based on GCC 4.3? This is good (latest I had previously seen was 4.1). Are patches available for any later versions of GCC?
see https://github.com/hebisch/gpc/blob/master/README
I [Waldek] have tested build with gcc-3.4.6, gcc-4.1.2, gcc-4.2.4 and gcc-4.3.5. ...