Hi! I'm the author of this article. Thanks for posting it.
The GIL is an old topic, but I was surprised to learn recently that it's much more nuanced than I thought. So, here is my little research.
This article is a part of my series that dives deep into various aspects of the Python language and the CPython interpreter. The topics include: the VM; the compiler; the implementation of built-in types; the import system; async/await. Check out the series if you liked this post: https://tenthousandmeters.com/tag/python-behind-the-scenes
In the opening paragraph you state that the GIL prevents speeding up CPU-intensive code by distributing the work among multiple threads.
My understanding is that distributing work across multiple threads would not speed up CPU-intensive code anyways. In fact it would add overhead due to threading.
There are two things you should consider here, wall clock time, and cpu time. Making code faster using multiple threads will increase CPU time by some amount, but because that work is now distributed between several cores it should actually reduce wall clock time.
There are many CPU bound tasks which can be made multithreaded and faster, but it does depend on the task and how much extra coordination you’re adding to make it multithreaded.
The author is speaking about the general concept of threading, outside of Python (without using C extensions to help out as discussed in the article). In general, if you don't have a GIL, and you have 2 or more cores then if you run additional threads you will see a speedup for CPU-intensive code. The actual speedup will vary. A sibling comment mentions embarrassingly parallel problems, those are things like ray tracing, where each computation is independent of all the others. In those cases, you get near linear speedup with each additional core and thread. If there is more coordination between the threads (mutexes and semaphores, for instance, controlling access to a shared datum or resource) then you will get a less-than-linear speedup. And if there is too much contention for a common resource, you won't get any speedup and will see some slowdown due to the overhead introduced by threads.
If it was being distributed amongst python threads (which run on one hardware thread), then CPU performance can't improve since they're just taking turns using the CPU. If you're running on multiple hardware threads (what I assume the author meant), that can causes better CPU performance since it will distribute work across real threads that can run in parallel.
The GIL restricts use of multiple hardware threads.
You can very well parallelise CPU-intensive problems. Look at e.g. "Embarassingly parallel" on Wikipedia.
Intuitively if you can divide your work into chunks that are large enough, the scheduling overhead becomes negligible.
what are you talking about? worker thread pools are the most common way to take advantage multiple cores. Typically can see speedups (for highly parallel codes) of nX for n cores.
Sure, but I think if you are discussing this type of thing in the context of python you have to use the threads/processes terminology to avoid confusion.
The reason why this is true in Python is the GIL. In other languages without a GIL, multiple threads will run on multiple cores and can speed up CPU bound code.
The GIL is an old topic, but I was surprised to learn recently that it's much more nuanced than I thought. So, here is my little research.
This article is a part of my series that dives deep into various aspects of the Python language and the CPython interpreter. The topics include: the VM; the compiler; the implementation of built-in types; the import system; async/await. Check out the series if you liked this post: https://tenthousandmeters.com/tag/python-behind-the-scenes
I welcome your feedback and questions, thanks!