What do you mean by "not as badly as the fully-serialized scenario"? I thought t...

jey · on March 26, 2010

I'm pretty sure that there's a good chunk of stuff you can do in Python without acquiring the GIL -- the problem is that in practice you end up doing a lot of I/O and stuff that requires at least momentarily acquiring the GIL, leading to contention. So if you stuck to the operations that didn't require locking any GIL-protected data, you could run at full throughput. It's at least not the case that the GIL is held all the time while running a Python thread -- the problem is instead that your threads end up having to acquire it often.

JoachimSchipper · on March 26, 2010

Actually, the GIL is needed to execute Python code (well, access Python objects). It is released by I/O- or computation-heavy C code, so e.g. SciPy or reading files allows some level of parallelism, but pure-Python code will be serial.

devinj · on March 26, 2010

Pure Python code can do a lot. Large parts of the stdlib are extension modules, and can release the GIL (e.g. one of the tests on the bug tracker used time.sleep()). In practice, if your code is IO-bound, it's doing work in an extension module, and if it's CPU-bound, it should be doing its work in an extension module. So there's actually not a huge problem.

jey · on March 26, 2010

I stand corrected. And frightened. fork(), here I come!

cma · on March 27, 2010

fork() isn't that great for a lot of situations. If you are thinking of taking advantage of your operating system's copy-on-write paging by loading a large chunk of data to be used read-only, forking a bunch of processes, processing the data each processes, and finally, 'reducing' the results of all of the forks into some sort of output, don't bother.

What happens is when you read an object in one process, python increments the reference count, thus touching the memory page, thus copying it, thus screwing you.

(however, compacting garbage collection turns out to have more or less the same problem)