Silvio a Beccara wrote:
The random generator is in both cases the Mersenne Twister (routine mt19937 from the authors' site). And I had forgot the Fortran program. Anyway you are right: the pseudorandom generator takes the most time in the program. But here is another example, this time both in Pascal and in Fortran:
In Pascal program you have:
type tMatrix = array[0..size, 0..size] of longint;
In Fortran:
integer m1 ( 0:size, 0:size), m2 ( 0:size, 0:size),
AFAIK Fortran integers on x86 are 32 bit, but GPC longint is 64 bit and highier precision has it cost. Also, Fortran passes array by reference, but you passed arguments to 'mmult' by value:
procedure mmult(rows, cols : integer; m1, m2 : tMatrix; var mm : tMatrix );
to pass by reference you can use 'const' attribute: procedure mmult(rows, cols : integer; const m1, m2 : tMatrix; var mm : tMatrix );
I have modified your program to use integer instead of longint and to use 'const' attribute (as above). Also tried a two versions of GPC with different options: original modified gpc-20041017+gcc-3.3.5 -O2 -march=athlon-xp 0.035884 0.008142 gpc-20041017+gcc-3.3.5 -O2 -march=i686 0.033786 0.008811 gpc-20030830+gcc-3.3.2 -O2 -march=athlon-xp 0.037098 0.013156 gpc-20020510+gcc-2.95.3 -O2 -march=i386 0.039530 0.025574 gpc-20020510+gcc-2.95.3 -O2 -march=i686 0.037567 0.026358
As you can see with new gpc modified version is 4 times faster then original. Main gain comes from reduced precision but optimizing for correct processor gives 10% and 'const' attribute another 10%. The fastest version takes 6.28 clocks per inner loop iteration which still looks too high for me. But it seem hard to get better speed without significantly changing program.
By the way, if what you want is matrix multiplicatin than using Atlas library may be a solution: Atlas is hand optimized and IMHO hard to beat. IIRC Atlas is floating point only, but converting to integer to doubles, doing floating point matrix multiply and converting back to integers is likely to be faster then direct integer matrix multiply (integer and floating point arithmetic are of similar speed, but floating point registers are separate from integer registers, so floating point program effectively have more registers to use).