GPC speed

List overview All Threads
Download

newer

older

Re: GPC speed

Operator overloading troubles

Silvio a Beccara

25 Oct 2004 25 Oct '04

2:37 p.m.

Hi everybody,

I ran some tests that make extensive use of large (up to 10000 elements) arrays, up to three dimensions both in GPC and in Fortran (Intel). Even turning on optimisations, GPC is at least five times slower than Fortran in completing the tests. Anyone can tell me why, and possibly some way to make GPC recover some of this lag?

Thanks

Silvio a Beccara University of Trieste - Italy

Show replies by date

Waldek Hebisch

25 Oct 25 Oct

4:02 p.m.

Silvio a Beccara wrote:

...

I ran some tests that make extensive use of large (up to 10000 elements) arrays, up to three dimensions both in GPC and in Fortran (Intel). Even turning on optimisations, GPC is at least five times slower than Fortran in completing the tests. Anyone can tell me why, and possibly some way to make GPC recover some of this lag?

There are many reasons why GPC may be slower and many ways to speed up programs. Post the code (or a link if it is large) -- otherwise all what we can say would be pure speculation. Also, it may be usefull to compare GNU Fortran and Intel Fortran -- since GNU Pascal and GNU Fortran use the same backend, such comparison would separate effect of different language from differences in backends.

-- Waldek Hebisch hebisch@math.uni.wroc.pl

Silvio a Beccara

26 Oct 26 Oct

5:39 p.m.

Hi, the code is short, so I think I can attach it here. It uses both dynamic and static arrays, but to my surprise the difference in time is slight.

-------------------------------------------------------- program arspeed;

uses GPC;

type

PMatr2Dob = ^TMatr2Dob; TMatr2Dob (Size1, Size2: Integer) = array [ 1 .. Size1, 1..Size2 ] of double;

PVecDob = ^TVecDob; TVecDob (Size: Integer) = array [ 1 .. Size ] of double;

const mn = 5000;

var

matrdyn: PMatr2Dob; vecdyn: PVecDob;

matrstat: array [ 1..mn, 1..mn ] of double; vecstat: array [ 1..mn ] of double;

i, j, k, ntimes, itime: integer; r1, r2, deltat, deltaperc: double;

time1, time2 : longCard;

begin

new (matrdyn, mn, mn); new (vecdyn, mn); ntimes := 5;

writeln ('# filling up DYNAMIC array [', mn, ', ', mn, '] with random doubles ', ntimes, ' times.' );

time1 := GetMicroSecondTime;

for itime := 1 to ntimes do begin

for i := 1 to mn do begin for j := 1 to mn do begin r1 := random; matrdyn ^[ i, j ] := r1; end; end;

end; //end itime

time2 := GetMicroSecondTime; writeln('DYNAMIC array: total elapsed time = ', ( (time2-time1) * 0.000001):7:2, ' seconds.' );

for i := 1 to 3 do writeln;

writeln ('# filling up STATIC array [', mn, ', ', mn, '] with random doubles ', ntimes, ' times.' ); time1 := GetMicroSecondTime;

for itime := 1 to ntimes do begin

for i := 1 to mn do begin for j := 1 to mn do begin r1 := random; matrstat [ i, j ] := r1; end; end;

end; //end itime

time2 := GetMicroSecondTime; writeln('STATIC array: total elapsed time = ', ( (time2-time1) * 0.000001):7:2, ' seconds.' );

for i := 1 to 3 do writeln;

ntimes := 1000; writeln ('# filling up DYNAMIC array [', mn, '] with random doubles ', ntimes, ' times.' ); time1 := GetMicroSecondTime;

for itime := 1 to ntimes do begin

for i := 1 to mn do begin vecdyn ^[ i ] := random; end;

end; //end itime

time2 := GetMicroSecondTime; writeln('DYNAMIC array: total elapsed time = ', ( (time2-time1) * 0.000001):7:2, ' seconds.' );

for i := 1 to 3 do writeln;

writeln ('# filling up STATIC array [', mn, '] with random doubles ', ntimes, ' times.' ); time1 := GetMicroSecondTime;

for itime := 1 to ntimes do begin for i := 1 to mn do begin vecstat [ i ] := random; end; end; //end itime

time2 := GetMicroSecondTime; writeln('STATIC array: total elapsed time = ', ( (time2-time1) * 0.000001):7:2, ' seconds.' );

end.

--------------------------------------------------------

As for comparing with G77, I agree, this would be fairer, but I am trying to make my code as fast as possible, and I should choose whether to begin translating it into Fortran. While I like Pascal a lot better, unfortunately I cannot afford to lose all this amount of performance. Thanks for your help

Silvio

| There are many reasons why GPC may be slower and many ways to speed up | programs. Post the code (or a link if it is large) -- otherwise all | what we can say would be pure speculation. Also, it may be usefull to | compare GNU Fortran and Intel Fortran -- since GNU Pascal and GNU Fortran | use the same backend, such comparison would separate effect of | different language from differences in backends.

Waldek Hebisch

6:39 p.m.

Silvio a Beccara wrote:

...

Hi, the code is short, so I think I can attach it here. It uses both dynamic and static arrays, but to my surprise the difference in time is slight.

<snip>

...

for i := 1 to mn do begin for j := 1 to mn do begin r1 := random; matrdyn ^[ i, j ] := r1; end; end;

This code spends most of execution time in random number generation (instead of storing just add all the random numbers, and compare with time to store a constant). AFAIK GPC has high quality, but moderate speed random number generator. If you need a lot of random numbers you should investigate more carefully available options -- you can easily find fast low-quality generators, but low-quality random numbers my ruin your results :(

The second main ingredient is linear scan trough memory. Such scan is slow -- you spend most time waiting for data from memory (even if you write). The code GPC generates for that scan is slow, but that slowness is masked by memory access.

-- Waldek Hebisch hebisch@math.uni.wroc.pl

Frank D. Engel, Jr.

25 Oct 25 Oct

5:30 p.m.

Did you remember to swap the order of the array dimensions? FORTRAN stores arrays in column-major order, while Pascal uses row-major, correct? This will affect cache hits and misses...

On Oct 25, 2004, at 10:02 AM, Waldek Hebisch wrote:

...

Silvio a Beccara wrote:

...
I ran some tests that make extensive use of large (up to 10000 elements) arrays, up to three dimensions both in GPC and in Fortran (Intel). Even turning on optimisations, GPC is at least five times slower than Fortran in completing the tests. Anyone can tell me why, and possibly some way to make GPC recover some of this lag?

There are many reasons why GPC may be slower and many ways to speed up programs. Post the code (or a link if it is large) -- otherwise all what we can say would be pure speculation. Also, it may be usefull to compare GNU Fortran and Intel Fortran -- since GNU Pascal and GNU Fortran use the same backend, such comparison would separate effect of different language from differences in backends.

-- Waldek Hebisch hebisch@math.uni.wroc.pl

----------------------------------------------------------- Frank D. Engel, Jr. fde101@fjrhome.net

$ ln -s /usr/share/kjvbible /usr/manual $ true | cat /usr/manual | grep "John 3:16" John 3:16 For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life. $

___________________________________________________________ $0 Web Hosting with up to 120MB web space, 1000 MB Transfer 10 Personalized POP and Web E-mail Accounts, and much more. Signup at www.doteasy.com

Maurice Lombardi

8:30 p.m.

Silvio a Beccara a écrit:

...

I ran some tests that make extensive use of large (up to 10000 elements) arrays, up to three dimensions both in GPC and in Fortran (Intel). Even turning on optimisations, GPC is at least five times slower than Fortran in completing the tests. Anyone can tell me why, and possibly some way to make GPC recover some of this lag?

There is one issue which is frequently overlooked when comparing matrix computations in Pascal or C versus Fortran. Fortran stores matrixes colum wise while the others store matrices row wise. Only a matter of taste ? not quite.

The problem is with page breaks inside the CPU (First and Second order internal caches), not the ones reported by linux which pertain to multitasking issues.

To diminish such page break one needs to access matrix elements as consecutively as possible. When algorithmes are taken from well optimized fortran, and translated into C or pascal line by line, like in the popular "Numerical Recipes" book, one would frequently need to reverse the order of nested loops, and even to rewrite somewhat the algorithmes in order to access the elements in right order.

Sometimes more subtle changes are very useful. If you look for example to algorithmes for diagonalizing symmetric matrices which occur in two steps, triadiagonalizion (Tred2), and solution of the tridiagonal form (Tqli), one "wise improvement" is to use the fact that only half the matrix is really needed, due to symmetry, to store in the first step results used in the second. But one side effect is that some matrix elements are then accessed in an (reversed) L fashion instead of pure comlumn (or row) order. It is then vastly better to discard the intermediate results (a loss of order N^2) and to recompute them in the tqli step (N^2 recomputations in an order N^3 algorithm for eigenvectors), to enable to access entirely in right order.

The improvement when computing eigenvectors is of a factor of two or three, with respect to original "Numerical Recipes" algorithmes, whether in C or in Pascal. Exact mileage depend on the version of the cpu (in the x86 series) on the system (djgpp, mingw, W98, XP..), and above all of course on the sizes of the caches and the matrices, but it remains always important.

To give an example. On my machine Pentium4, 256kb L2 cache, 1Gb central memory, running djgpp on a W98 dos Box, with gcc 3.2.3 compilers, and the last gpc (20041017) Diagonalisation of random 1000x1000 double precision matrices, computing eignvalues and eigenvectors always with options -march=pentium4 -O3 Using first "Numerical Recipes" routines, taken from fortran and translated (by them), as unmodified as possible, to C and Pascal: each number is an average of 4 runs to evaluate the effect of different randomization algorithmes: the dispersion of results is quite small

C: 143 seconds F77: 60 sec. Gpc: 120 sec. Gpc with the two modifications indicated (swap of loop orders and elimination of a "wise trick"): 41 sec. !!!

So making meaningfull speed comparisons is more tricky than you seem to suspect.

Maurice

-- Maurice Lombardi Laboratoire de Spectrometrie Physique, Universite Joseph Fourier de Grenoble, BP87 38402 Saint Martin d'Heres Cedex FRANCE Tel: 33 (0)4 76 51 47 51 Fax: 33 (0)4 76 63 54 95 mailto:Maurice.Lombardi@ujf-grenoble.fr

7565

Age (days ago)

7566

Last active (days ago)

gpc@gnu.de

5 comments

4 participants

tags (0)

participants (4)

Frank D. Engel, Jr.
Maurice Lombardi
Silvio a Beccara
Waldek Hebisch