Re: Range Checking and Speed

20 Oct 2003


      Kevan Hashemi wrote:
...
Dear Frank and Adriaan,
...
From Frank:
If your version doesn't recognize `{$R-}', it's probably older and
doesn't do range-checking at all, so you don't need to worry.
Thank you, and I have determined that the indexing performed for the dynamic
schemata are as fast as my own pointer arithmetic. With the schemata instead
of pointer arithmetic, my code is now half as long, and far more readable.
Converting my code to GPC has been a real pleasure, and I continue to
discover new and delightful features.
...
From Adriaan:
I can hardly believe it's the generated code, but nothing is impossible
... If you send me the source code, I will have a look at it (but I
have very limited time available this month).
Your offer is most generous. I have done some work to try and make your
investigation simpler.
I am comparing the CW Pascal compiler issued about five years ago with the
GPC compiler I installed two months ago. The CW code runs in MacOS 9. The
GPC code runs in MacOS X from the UNIX terminal.
Here is my GPC test program:
program test;
var
    a:real;
    m,n,p:integer;
begin
    a:=1234.567;
    writeln('starting loop...');
    for m:=0 to 1000 do
        for n:=0 to 1000 do begin
            for p:=1 to 100 do begin
        { loop statement here }
            end;
        end;
    writeln('done.');
end.
In CW the code looks the same, but the variables are longreals and longints
so that they match the GPC sizes. As you can see, the loop statement gets
executed 100,000,000 times. I measure the execution time by counting seconds
in my head between the prints to the console. I'm using an 800 MHz iBook.
Here are my results:
loop          CW time      GPC time
statement       (s)          (s)
none            1             3
a:=a*a/a        6             13
a:=p            3             6
a:=round(a)     8             40
a:=sin(p)       15            40
...
Correction:
loop          CW time      GPC time
statement       (s)          (s)
a:=sin(p)        15           20 (not 40)
Below are timing results from testing on a 500MHz G4 with the results
linearly scaled to 800 MHz for comparison purposes.  I used calls to the
Mac OS  Microseconds system routine immediately before and after the
loops to gather the timing data.  For CW Pascal, level 4 global
optimization with peephole optimization and 7400 PPC instruction
scheduling was used (in other words, maximal optimizations).  For GPC, I
used a plain vanilla automake with no optimization argument for the
first column and a -O3 optimization argument for the second column.
For CW Pascal testing, I used the CW Pro 7 Pascal update compiler for
compiling and tested running cooperative multitasking Mac OS 9.1.  For
GPC testing, I used Adriaan's gpc-3.3d6 version (built using gpc version
20030507, based on gcc-3.3) and tested running preemptive multitasking
Mac OS X 10.2.6.
The "a:=round_fp(a)" line is a new test which doesn't require the float
point to integer and integer to floating point conversions.  For this
test, I used the Mac OS system round function (from the fp unit) which
takes a double floating point argument and returns a rounded double
floating point result.  (Note for Mac OS users who may not be aware of
it, it is possible to use both the Pascal language integer returning
round and the Mac OS system double returning round in the same code. 
Although the means to do so are different in CW Pascal and GPC, both
support language features which provid for this possibility.)
loop          CW time      GPC time		GPC -O3 time
statement    (Scaled s)   (Scaled s)	      (Scaled s)
none            1.12          2.12			0.82
a:=a*a/a        6.06          6.40			5.36
a:=p            3.18          3.67			2.49
a:=round(a)     8.54          26.54			21.83
a:=round_fp(a)  5.79          7.33			6.13
a:=sin(p)       13.85         16.45			14.87
Overall, at least with my testing configuration and with the exception
of a:=round(a), there isn't much of a performance difference between GPC
and CW Pascal.  Unoptimized GPC code is slightly slower than fully
optimized CW Pascal code and optimized GPC code is slightly faster than
CW Pascal code for non-library call related code.  (The library call
code, round_fp and sin, ends up using Apple supplied code for both
compilers so the differences are really due to differences in Apple's
library call mechanics and code implemention between Mac OS 9.1 and Mac
OS X 10.2.6.)  In light of the 100,000,000 iteration count and the
"artifical" pattern of the test code, I think for a more realistic code
pattern mixture the speed differences between optimized CW Pascal and
optimized GPC will be noise level for most code.
Now, getting to the relatively large GPC difference with a:=round(a). 
After looking at the differences in generated code between the compilers
(assembly code and algorithms implemented), I think the main difference
is that GPC supports rounding to 64 bit integers and CW Pascal only
supports rounding to 32 bit integers.  On a 32 bit PPC CPU converting
between 64 bit integer and floating point formats requires quite a few
more instructions than 32 bit integer format conversions.  Another
"penalty" with the PPC CPU is the data has to go through memory since
there is no direct connection between integer registers and floating
point registers and with double the data you're hitting, if not
exeeding, the limits of the load/store unit capabilies.
As a general PPC performance rule of thumb, it is always a "win" to
minimize the number of floating point to/from integer format conversions
since the conversion is relatively expensive.  With a:=round(a), two
conversions are performed  (64 bit integers for GPC and 32 bit integers
for CW Pascal).   The round function involves a floating point to
integer conversion and the result is then reconverted back to floating
point for storing in variable 'a'.  (The back-to-back conversions can't
be optimized away due to the possibility of integer overflow in the
floating point to integer conversion.)
...
Most things take two or three times as long with GPC. I expect code running
on MacOS X (UNIX) to be slower than code on MacOS 9 because MacOS X is
re-entrant, is subject to pre-emptive multitasking, and provides protected
memory. I am more interested in the fact that the GPC round() function takes
four times as long as the GPC implementation of a:=a*a/a, while the CW
round() function takes about the same time as the CW implementation of
a:=a*a/a.
As you know, rounding a number with platform-independent mathematical
functions is slow. The CW round() probably uses the Power PC real number
format to abbreviate the rounding process. Perhaps GPC uses a platform-
independent implementation.
Actually, although the algorithms are different, both CW and GPC use
platform independent round() implementations.  As mentioned above, I
think the major factor in performance difference is that GPC's round
returns 64 bit integer results whereas CW Pascal only returns 32 bit
results.  (CW implements round as a library routine written in ISO C and
by eyeball check looks to be a less efficient algorithm than GPC's algorithm.)
...
I have always used round() with sinusoidal look-up tables in my fourier
transforms. The above results suggest that I gained very little by doing so.
Nevertheless, I also use round() to obtain display coordinates from
real-valued graphs, and these routines are running five times slower than
before.
You might want to look into using one of the several floating point
rounding functions declared in Apple's Universal Interfaces fp.p(.pas)
unit.  Depending upon your needs, one of the routines may yield better
performance for both CW Pascal and GPC as can be seen in the a:=round(a)
versus a:=round_fp(a) line timing results above.  If platform
independance is a concern, I'll note that fp.p(.pas) is mostly just an
Apple repackaging of the latest ISO C standard required math.h which can
be easily dealt with with GPC as long as the platform target has a
fairly up-to-date GPC/gcc.
Gale Paeper
gpaeper@empirenet.com

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

Re: Range Checking and Speed