Dear Waldek,
My past experience was that I needed about month of work to update to new GCC version (say 4.1 to 4.2).
I'm looking at this list of releases:
https://www.gnu.org/software/gcc/releases.html
Looks as if you would have to spend half your time working on this just to keep up.
Could you post your benchmark.
I'll prepare a stand-alone Pascal console program that inverts matrices populated with random numbers, and can be compiled in FPC and GPC by use of a compiler directive. The code won't be exactly the same: on GPC I used dynamic schema for the matrices, and on FPC I use dynamic arrays. I will then send the code as an attachment to the list with some benchmark measurements in the comments. This effort will have to wait a week or two, because I have dropped behind schedule with other stuff, and Brandeis University campus is opening up at last. But I will do it.
- tiny routines (especially tiny loops) have kind of random speed,
adding useless instrution can make tiny look go faster, moving code to different place in memory can change runtime
Interesting. The matrix inverter is not a tiny loop. It is subtracting matrix rows and replacing matrix rows with new rows. So it's a lot of real number arithmetic and conditional branches and access to the local cache. Well, I assume it's the local cache. The matrix is 100 x 100 x 8 byte real = 80 kBytes. My understanding is that modern caches are measured in Megabytes.
- Pascal semantics sometimes forces less efficient code.
I'll grant you that. I did wonder if the GPC dynamic schema type that I was so fond of, with its freedom to assign any index range like 1..10 instead of 0..9, can slow things down by forcing an addition before memory access. If the compiler could not figure out how to offset the index register memory access, there would be a slow-down. I'm implementing dynamic arrays with range 1..n in FPC by creating an array 0..n and ignoring the 0 element.
In particular Pascal requires "correct" handling of edge cases and code needed for this may increase runtime
I had forgotten about that. I must look into this, and make sure I have GPC and FPC on a level playing field when it comes to range checking at run-time. It could be that such checks are turned off in FPC and turned on in GPC and this explains the entire difference in execution time.
- bit-packed arrays are particularly inefficient.
I'm not bit-packed arrays. I have packed arrays of bytes for images, but other than that the compiler, for the moement, free to choose the alignment of the fields. I will have to constrain the byte alignment later when I build static libraries to link to C and Fortran. But for now they are unconstrained.
Well, Aplle decided to use LLVM mostly for licencing reasons
The devils.
Now LLVM has accumulated its own warts and partially in response to LLVM GCC folks have done large cleanups.
That's encouraging.
My student made a small compiler interfacing LLVM (he did not want to deal with GCC). In about 2 years his compiler got completely broken by changes to LLVM interface...
Darn it!
Best, Kevan