I have added support to gp for compiling multiple files simultaneously (akin to the -j4 flag to make). Since I have a dual processor machine, this dropped the total compile time for Interarchy from 8.4 minutes to 6 minutes.
I basically replaced ProcessDependencies and CompileIfNecessary, and hardly touched any of the rest of gp. But the sections replaced are essentially complete rewrites. It's not the most elegant of code, is still full of debugging code, is not written in gp-style, and does not currently support a command line switch, is minimally tested, and perhaps not all that efficient, but if anyone is interested, I'm happy to send them the code.
Enjoy, Peter.