Dear Waldek,
thanks very much for your suggestions. As for the optimisation switches, -march=athlon-xp doesn't seem to work on my system (Mandrake 9.1, gpc version 2.1 (20020510), based on 2.95.3 20010315 (release)). Do I have to compile or install GPC with particular options?
If I use doubles instead of integers, do you think that the performances will again enjoy this enhancement?
Thanks, best regards
Silvio a Beccara
| Silvio a Beccara wrote: | > The random generator is in both cases the Mersenne Twister (routine | > mt19937 from the authors' site). And I had forgot the Fortran program. | > Anyway you are right: the pseudorandom generator takes the most time in | > the program. But here is another example, this time both in Pascal and | > in Fortran: | | In Pascal program you have: | > type tMatrix = array[0..size, 0..size] of longint; | | In Fortran: | > integer m1 ( 0:size, 0:size), m2 ( 0:size, 0:size), | | AFAIK Fortran integers on x86 are 32 bit, but GPC longint is 64 bit | and highier precision has it cost. Also, Fortran passes array by reference, | | but you passed arguments to 'mmult' by value: | > procedure mmult(rows, cols : integer; m1, m2 : tMatrix; var mm : tMatrix | > ); | | to pass by reference you can use 'const' attribute: | procedure mmult(rows, cols : integer; const m1, m2 : tMatrix; var mm : | tMatrix ); | | I have modified your program to use integer instead of longint and | to use 'const' attribute (as above). Also tried a two versions of | GPC with different options: | original modified | gpc-20041017+gcc-3.3.5 | -O2 -march=athlon-xp 0.035884 0.008142 | gpc-20041017+gcc-3.3.5 | -O2 -march=i686 0.033786 0.008811 | gpc-20030830+gcc-3.3.2 | -O2 -march=athlon-xp 0.037098 0.013156 | gpc-20020510+gcc-2.95.3 | -O2 -march=i386 0.039530 0.025574 | gpc-20020510+gcc-2.95.3 | -O2 -march=i686 0.037567 0.026358 | | | As you can see with new gpc modified version is 4 times faster | then original. Main gain comes from reduced precision but | optimizing for correct processor gives 10% and 'const' attribute | another 10%. The fastest version takes 6.28 clocks per inner | loop iteration which still looks too high for me. But it seem | hard to get better speed without significantly changing | program. | | By the way, if what you want is matrix multiplicatin than using | Atlas library may be a solution: Atlas is hand optimized and | IMHO hard to beat. IIRC Atlas is floating point only, but | converting to integer to doubles, doing floating point matrix | multiply and converting back to integers is likely to be | faster then direct integer matrix multiply (integer and floating | point arithmetic are of similar speed, but floating point | registers are separate from integer registers, so floating point | program effectively have more registers to use).