Dear Jonas,
My guess is the difference may be due to moving to i386 (32 bit) to x86-64 (64 bit).
There is an increase in speed moving from 32-bit to 64-bit alone, and I'm able to measure that on Linux, because we have both 32-bit and 64-bit GPC on Linux. Here's time of 100x100 matrix inversion on various platforms. I'm running Linux and Windows in Virtual Box on the same MacOS machine.
OS 32/64 GPC/FPC t (ms) MacOS 32 GPC 17 MacOS 64 FPC 3.1 Linux 32 GPC 16 Linux 64 GPC 11 Linux 64 FPC 3.2 Windows 32 GPC 16 Windows 64 FPC 3.0
So 16 ms drops to 11 ms moving from 64-bit to 32-bit on Linux for matrix inversion. In the extreme case of 100% 8-byte transfers, we'd get 50% ddrop in execution time, so I was pleased to see a 30% drop. But you'll see the execution time drops to 3.2 ms when I compile with FPC.
While FPC's code generation is definitely not bad, it does lack many transformations that GCC (and LLVM) have.
In that case, perhaps my FPC matrix inverter is faster because of the changes I had to make in the dynamically-allocated matrix implementation. So I have been looking at the execution time for another type of analysis: differentiating gray-scale images and then identifying unusual squares in the pattern. Now I find that GPC is faster than FPC.
OS 32/64 GPC/FPC t (ms) MacOS 64 FPC 14 Linux 64 GPC 11 Linux 64 FPC 15 Windows 64 FPC 15
So that's 11 ms for GPC and 15 ms for FPC on Linux, Windows, and MacOS.
Best, Kevan