Boris Herman wtote:
Hi list,
I have compiled a single-threaded (and quite non MP aware) pascal program that does a lot of numerical computations. One calculation takes about 5 days on my 1.25 GHz DDR G4 (single cpu) and I need about 200 computations to do. Well, when I run the program on my G4 it takes
My program doesn't really do anything much - it performs gazillions of computations in arrays that all together consume less than 20 Mb of memory.
Am I doing anything wrong or shouldn't I expect better performance on a dual cpu unit?
If you really care about speed you should verify where the botleneck is. When you compute you need to take data from memory to the processor first, compute and write back the result. With small data (fitting in processor cache) you stress processor (still, less acceses to the cache makes things faster). With larger data memory (DRAM) speed matters. If you move slowly trough your data then you can fully utilize the processor speed. However if you make many "fast" passes memory bandtwidth is the botleneck. Your processor shold deliver more then Gigaflop (2.5 GF???), but for double precision dot product you need 8 byte per flop, and I bet that your memory is unable to deliver 8 GB per second. Also, DRAM delivers normally block of 32-128 bytes (a cache line), so if you access scattered data you tranfer much more than you need.
If your botleneck is memory bandtwidth then adding CPU-s does not help, you still have the same memory. If your memory is fast enough then new process on new CPU should work in paralel with old CPU giving the speedup.
Both memory access and actual computations count as "cpu time" in OS. To know which dominate you may simply count various operations your program is doing. Little expriments changing size of your arrays may help. There are also special tools -- I know many for PC, but there must be something for Mac too.