Re: Range Checking and Speed

20 Oct 2003


      Dear George, Frank, Adriaan, and Gale,
George wrote:
...
Are you enabling any optimizations while compiling?
No, I was not. Thank you for your description of the optimization options.
Gale has filled out my table with the effect of -O3. I tried -O2 and -O3 on
my fourier analysis routine, and obtained the following results (ms is for
milliseconds).
none  60 ms (measured by GetTimeStamp 100 repetitions)
-O2   50 ms  
-O3   50 ms
The GPC code calculates sin() directly with the sin() function. My old CW
code looks up the sin() value in a table with the help of round(). For the
old code I get:
CW    30 ms (measured by second hand and 1000 repetitions)
I have a 30% increase in execution time, but I get platform-independent code
and exact results from the transform.
Frank wrote (referring to features of MacOS X):
...
Provided the hardware supports these features (I suppose it does,
though I don't have a Mac myself) and there's no heavy concurrent
load (which would cause context switches etc.), this should hardly
matter.
You're right. I ran my CW test code on MacOS 9 and then within OS X's MacOS
9 emmulator. The 10^8 sin() loop takes the same amount of time on both
operating systems (about sixteen seconds).
Frank wrote:
...
Maybe using `Trunc' (and adjusting the tables accordingly, or using
`Trunc (x + 0.5)' if you know the argument will always be
non-negative) is an option.
Using -O2 and the unix "time" facility, I get 10^8 round() in 22.3 seconds,
and 10^8 trunc() in 21.1 seconds.
Adriaan wrote:
...
For-loops themselves are suspect in GPC, see the thread "A littlle
benchmark" in the GPC mailing list archives:
Looking at Gale's results, it appears that the -O3 optimization takes care
of the slow for-loop problem.
Adriaan wrote:
...
By the way, you can also use 'TickCount' (which returns a count in 1/60
seconds) or 'Microseconds'.
Yes, but in this case I liked the separation of the timer and the thing to
be timed. Also, the code is the same for both compilers (except for longint
and longreal), which I thought would save you some mental effort.
Adriaan wrote:
...
If you look in the fp.pas unit provided with the ported GPCPInterfaces,
you will find there a wealth of mathematical routines.
The fp.pas routines are tempting, but I am trying to write multi-platform
GPC code. My impression is that I will have to re-implement the fp.pas
routines when I port my code to Linux and UNIX.
Gale wrote:
...
loop          CW time      GPC time        GPC -O3 time
statement    (Scaled s)   (Scaled s)          (Scaled s)
none            1.12          2.12            0.82
a:=a*a/a        6.06          6.40            5.36
a:=p            3.18          3.67            2.49
a:=round(a)     8.54          26.54            21.83
a:=round_fp(a)  5.79          7.33            6.13
a:=sin(p)       13.85         16.45            14.87
Your CW times are the same as mine. I turned on -O3 and went through the
tests. I confirmed your times using the command-line "time" facility. When I
repeat the "time" measurements, they vary by about 5%, and are about 10%
faster than yours.
I was left wondering why my GPC no-optimization results of Saturday were
longer than yours, so I turned off -O3 and re-built. Today, 10^8 executions
of a:=a*a/a took 7 seconds, but on Saturday they took 13 seconds. I
unplugged my power adaptor and tried again: back to 13 seconds. While I was
running off my battery on Saturday, MacOS X was switching the CPU to
half-speed to save energy, but OS9 was not. Thus the CW code had an
advantage.
So, thank you for repeating my tests. I now agree with your results. By the
way, I turned on and turned off all the CW optimizations, as I have done
before, and noticed no significant difference in performance.
Gale wrote:
...
You might want to look into using one of the several floating point
rounding functions declared in Apple's Universal Interfaces fp.p(.pas)
unit.
I'm going to call sin() instead of using a sinusoidal look-up table. I am
amazed at the efficiency of the sin() function. Consider the following GPC
results. Recall that p is an integer index, climbing from 1 to 100. I set
b:=12.345 before the loop, so it is constant throughout.
loop         GPC   
statement     (s)     Optimization
none          0.64       -O3
a:=sin(p)     17         -O3
a:=sin(b)     0.92       -O3 (clever optimizer, knows b is constant)
a:=sin(b)     0.89       -O2 (clever again)
a:=sin(b)     16         none
a:=sin(a+b)   16         -O2
It takes only 150 ns to calculate the sin of a real number, which is 120
clock cycles, and a little over twice the time it takes to calculate a*a/a.
The fastest round() function we have takes about 50 clock cycles, and we
need another ten or twenty to access a look-up table. Using sin() is 30%
only slower.
My second use of round() is in drawing lines. But in this case, it is easy
to implement an incremental rounding function that adds or subtracts one
from an integer as its real-valued cousin rises or falls.
Gale wrote:
...
Now, getting to the relatively large GPC difference with a:=round(a).
After looking at the differences in generated code between the compilers
(assembly code and algorithms implemented), I think the main difference
is that GPC supports rounding to 64 bit integers and CW Pascal only
supports rounding to 32 bit integers.
Thank you for the explanation.
In short: GPC with -O3 and fp.pas routines is about 10% faster than CW. If
you want to avoid using fp.pas, then you will still be faster than CW if you
keep the GPC round() and trunc() out of your most heavily-used loops.
Yours, Kevan Hashemi

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

Re: Range Checking and Speed