Miklos Cserzo wrote:
Hi Folks,
I am optimizing a code for speed in gpc. Let me share some strange observations of mine.
I have two vectors of size N and M. I need to perform a not too complex calculation between all the elements (NxM) two times. I thought I dump the result into an array of NxM and second time just read the array back and this should save me time on the expense of memory. In fact this is about 50% slower than generating the result dynamically twice!!! Is it normal in gpc? Why the handling of large arrays are so slow?
I have only the following experience with large arrays on gpc/djgpp (DOS): There is a 50% - 100% variation in execution times due to memory cache considerations (this cache is presumably the "level 2" 256 or 512 kB cache of the Pentiums). Matrices are stored in pascal row-wise (same for C but the opposite is true for fortran). If you access elements row after row you minimize cache breaks. In a procedure to diagonalize symmetrical matrices, just swapping the indexes to access memory elements in right order gave that gain. The exact value of the gain depends of course on the size of the matrices (there is no gain for small matrices diagonalized many times), but also apparently on the version of the pentium. It depends also on the optimization switches of the compiler: there is no gain with BP which is not execution-optimized and thus very slow. Similarly in every textbook on these procedures is explained a "clever" trick: you can use only the lower (or higher) half of the matrix (due to symmetry) to store elements and use the remaining to store intermediate results needed later. But this also disrupts the row-wise use of elements. Reorganizing the same algorithm to use the full matrix in order to access as far as possible row-wise and recomputing twice the intermediate results needed gives a gain of the same order of magnitude. The loss in recomputing is only of order N^2, while the gain in better elements access is of order N^3 !
So how have you stored and accessed your NxM elements ?
Maurice
PS this is different of the "page breaks" you can monitor in linux: these page breaks correspond to breaks due to virtual memory use. Here it is a standalone programm all in real memory (bare DOS with CWSDPMI).