I can't tick off every item in your checklist, and maybe there's a way around it, but the GIL does seem to get in the way of taking our work on SCons to the next level of optimization.
We have a pretty good thread-pool architecture for controlling builds, courtesy fine work by Anthony Roach and J.T. Conklin, and it works great for the actual build portion: we start a pool with N threads, each thread requests work from the central dependency tree as needed and controls what it kicks off.
We'd like to be able to use the same model for doing the dependency analysis that creates the tree, which is largely regular expression searches on the contents of source code files. Because we're using Python code to do that dependency analysis, we can't use the same model that serves us so well for controlling the build portion.
On a big multi-processor dedicated build server, where we could really use all the horsepower, during dependency analysis you can see all of the threads and processors go silent except for one at a time, which gets the GIL and calculates the dependencies. I'm told that the GIL is the (or a) big stumbling block here--I'm not the threading guru on our project--but if it's really not a big deal and there's some other way we could structure things to make use of the other processors, I'd love to hear about it...