Hi Folks,
I am optimizing a code for speed in gpc. Let me share some strange observations of mine.
I have two vectors of size N and M. I need to perform a not too complex calculation between all the elements (NxM) two times. I thought I dump the result into an array of NxM and second time just read the array back and this should save me time on the expense of memory. In fact this is about 50% slower than generating the result dynamically twice!!! Is it normal in gpc? Why the handling of large arrays are so slow?
The speed of the loops are also different by about 50% depending on the following alternative styles:
1)
for i:= 1 to 10 do whatever;
2)
const one=1; ten=10; ..
for i:= one to ten do whatever;
3)
var one, ten:integer; ..
one:=1; ten:=10; for i:= one to ten do whatever;
4)
var one, ten:integer; ..
one:=1; ten:=10; for i:= one to ten * one do whatever;
The real code use much longer loops in a nested fashion of course, so the speed difference is significant. Are there other speed-related tricks?
I am using an outdated version of gpc (2.0) on IRIX. Are these things any better in a more recent release?
Cheers,
miklos
On 16 Jun 2001, at 15:15, Miklos Cserzo wrote:
The speed of the loops are also different by about 50% depending on the following alternative styles:
[...]
All four of your iteration forms generate identical code with GPC 20010604 (compiled for the x86 processor) when run at the "-O2" optimization level. Try "gpc -S -O2 tester.pas" with the attached source file and examine the resulting "tester.s" file.
My understanding is that GCC (and, by inference, GPC) is intended to be run with the optimizer active, and therefore the output of non-optimized code is expected to be quite poor. I have read that most other compilers, when run "without optimization," are behaving comparably to GCC running at the "-O1" optimization level. Having looked at GCC's assembly output without any optimization, I can believe the assertion, as there are a remarkable number of redundant loads and stores.
So, if you are not already doing so, I would recommend compiling with at least "-O1" and trying your timing tests again.
-- Dave