Understanding the Python GIL (Pycon 2010 Slides)

cpr · on Feb 20, 2010

Isn't this just a good argument for event-based programming ala Twisted or Node.js, to avoid the system thread overheads altogether?

It seems to me that programming with event-based frameworks like Node.js is much less fraught with peril, confusion, and error.

(Having written a couple of small but complete (bare iron) real-time, thread/process-based operating systems (and applications for them) for workstation-class CPUs back in the 80's, I'm highly aware of the peril and confusion possible. ;-)

functional-tree · on Feb 21, 2010

That only works for one thread. Many (most?) cases for using multithreading are when you need to do something CPU-bound, for which event-based mechanisms don't help.

cpr · on Feb 21, 2010

In that case you can use webworker-like computational threads, avoiding global data locking, etc.

mgedmin · on Feb 20, 2010

Attempting the slide 3 example on a dual-core Core 2 Duo CPU running 32-bit Linux with Python 2.6 I get

  * single-threaded: 8.9s
  * two threads: 10.4s

Overhead is ~17% (compare to 2X slowdown reported on quad-core Mac OS X). Interesting.

d0mine · on Feb 20, 2010

For 4-cores CPU running 64-bit Linux with Python 2.6:

  * single-threaded: 7.6s
  * two threads:     8.8s
  * 4 threads:       9.14s
  * 4 processes (via multiprocessing) 2.0s

Overhead is ~16-20% (5% for multiprocessing)

    #!/usr/bin/env python

    from threading import Thread
    from multiprocessing import Process

    def countdown(n):
        while n > 0:
            n -= 1

    COUNT = 100000000

    def run_once():
        countdown(COUNT)

    def run_with_threads(nthreads, make_thread=Process):
        threads = [make_thread(target=countdown, args=(COUNT//nthreads,))
                   for _ in range(nthreads)]

        for t in threads:
            t.start()
        for t in threads:
            t.join()

crad · on Feb 20, 2010

He did mention in the talk that Linux does not act as weird due to the scheduler differences.

kingkilr · on Feb 20, 2010

I thought he said the reason Linux performance was better was because the locks were better?

bockris · on Feb 20, 2010

The talk was FANTASTIC. Hopefully the video will be up soon.

crad · on Feb 20, 2010

I agree, the best talk of the conference thus far.

crad · on Feb 20, 2010

I was fortunate enough to attend this talk. It was quite an eye-opener as far as how wonky thread performance is. The overall equation seems to be something like:

   # of threads * # of cores == context switch storm density.

David was clear to say that his presentation was not to discourage people from using threads.

The take-away for me are to really pay attention to implementation of my thread usage and to see if linux processor affinity would help in a situation where you control all the threads in your app.

The upcoming changes to the GIL for 3.2 has a dramatic impact on stability of behavior in threading but a negative impact on IO threads, which they are planning on addressing.

crad · on Feb 20, 2010

While David had dismissed CPU affinity because of the potential of other threads in the stack not benefitting from additional cores, I wanted to test and see what impact it would have. Granted these are single run #'s on linux, but it's interesting to see that even a single thread adds minor overhead.

Code:

    http://gmr.privatepaste.com/7ee226e690

Single call of the function:

    gmr@binti ~ $ ./test.sh 
    real	0m10.840s
    user	0m10.824s
    sys	        0m0.008s

With Processor Affinity on Single CPU:

    gmr@binti ~ $ taskset 01 ./test.sh 
    real	0m11.210s
    user	0m11.207s
    sys	        0m0.004s

Without Processor Affinity:

    gmr@binti ~ $ ./test.sh 
    real	0m14.389s
    user	0m12.488s
    sys	        0m3.789s

zach · on Feb 21, 2010

And it's typeset in Gill!

Did anyone who went to his Open Space session have a report?