The GIL needs to stay as-is if Python is going to be reasonably programmable at the C API. It would also make Python a lot slower, and it would break just about every extension... neither of which it needs. A GIL-free Python is going to have to wait until something smarter comes along, like PyPy, which could just automatically parallelize your code if it makes sense to do so.
If you have 16 processors, use 16 processes. You'll do fine. With shared memory on a decent kernel, the only real latency you'll see is going to be your fault ;)