Adriaan suggested I use --propagate-units to improve my compile speed.
I tried that, but given I use --uses=GPCMacOSAll, where GPCMacOSAll is all the Mac OS X system interfaces in a single unit, and it compiles to a gpi of 27Meg, that would make each of my gpi's 27Meg (perhaps worse if it ends up being included multiple times). But regardless, it didn't work because a simple compile of a trivial unit ended up taking multiple minutes itself, so something it not happy.
I'll have to see if I can manage to add explicit system units to the uses clause of my units in order to get --propagate-units in order to see if that will improve things, but I haven't done that yet.
Then I tried turning off one of my processors, reverting back to the original gp so it compiles one unit at a time, and using the Mac OS X Shark tool to profile the system while compiling some units.
It's not clear from the results what percentage of the time is actually being spent inside gpc1, but the report on a gpc1 processes is interesting (attached below). It shows 50+% in import_interface. This matches with what Adriaan saw in the progress messages (that the progress messages spent a long time in the lines around the uses clause).
Enjoy, Peter.
# Time Profile of Everything SharkProfileViewer # Generated from the visible portion of the outline view + 73.0% start (gpc1) | + 73.0% _start (gpc1) | | + 72.0% toplev_main (gpc1) | | | + 70.2% main_yyparse (gpc1) | | | : + 70.1% yyuserAction (gpc1) | | | : | + 54.3% do_extra_import (gpc1) | | | : | | + 54.3% import_interface (gpc1) | | | : | | | + 41.7% load_gpi_file (gpc1) | | | : | | | : + 27.4% load_node (gpc1) | | | : | | | : | - 6.4% get_identifier (gpc1) | | | : | | | : | - 4.7% load_string (gpc1) | | | : | | | : | - 4.7% mread1 (gpc1) | | | : | | | : | - 3.1% set_identifier_spelling (gpc1) | | | : | | | : | - 1.8% make_node (gpc1) | | | : | | | : | - 1.4% free (libSystem.B.dylib) | | | : | | | : | 1.1% szone_free (libSystem.B.dylib) | | | : | | | : | 0.3% mseek (gpc1) | | | : | | | : | 0.2% ggc_alloc (gpc1) | | | : | | | : | 0.1% itab_store_node (gpc1) | | | : | | | : | - 0.1% ht_lookup (gpc1) | | | : | | | : | - 0.1% build_decl (gpc1) | | | : | | | : | 0.1% sort_fields (gpc1) | | | : | | | : | 0.1% dyld_stub_free (gpc1) | | | : | | | : | 0.1% allocate_decl_lang_specific (gpc1) | | | : | | | : 11.8% compute_checksum (gpc1) | | | : | | | : - 1.5% gpi_open (gpc1) | | | : | | | : - 0.1% mread1 (gpc1) | | | : | | | - 12.4% import_node (gpc1) | | | : | - 13.2% finish_routine (gpc1) | | | : | - 2.0% finalize_module (gpc1) | | | : | - 0.3% import_interface (gpc1) | | | : | - 0.1% build_predef_call (gpc1) | | | : | - 0.1% start_unit_implementation (gpc1) | | | : - 0.2% yylex (gpc1) | | | - 1.5% write_global_declarations (gpc1) | | | - 0.1% init_regs (gpc1) | | | - 0.1% yyparse (gpc1) | | | - 0.1% lang_init_3_4 (gpc1) | | | - 0.1% init_emit_once (gpc1) | | 0.8% write_global_declarations (gpc1) | | 0.1% recog_12 (gpc1) | | 0.1% init_regs (gpc1) | | - 0.1% _call_mod_init_funcs (gpc1) - 15.9% thandler (mach_kernel) - 10.7% shandler (mach_kernel) - 0.3% unix_syscall (mach_kernel) - 0.1% thread_continue (mach_kernel) - 0.1% _dyld_start (dyld)