CPython internals: A ten-hour codewalk through the Python interpreter (2015)

c4obi · on March 9, 2017

I put together a ebook on the internals of the python interpreter. Get it for free at https://leanpub.com/insidethepythonvirtualmachine

makmanalp · on March 9, 2017

This is awesome - I wish every large software project had something like this that was a prep-course to be able to start contributing meaningfully!

saurabhjha · on March 9, 2017

Not a prep course, but for example Redis has a very good source code overview here https://github.com/antirez/redis#redis-internals

More remarkable is the fact that antirez updated the documentation in response to a post in Reddit. https://www.reddit.com/r/redis/comments/3re0aw/any_pointers_... Thank you antirez! :-)

gens · on March 9, 2017

Also check out libpng's source.

https://github.com/glennrp/libpng

Twirrim · on March 9, 2017

Oh wow. That's beautifully done. Simple comments that explain clearly what the code is doing, pretty clear choice of variable names so little head-scratching going on.

Karupan · on March 9, 2017

The documentation of redis is really good for a large open source project. I am not a contributor, but still read the source code from time to time. Full credit to antirez for taking the time to make it easy to contribute to redis!

c4obi · on March 9, 2017

Shameless plug here but I have put together a free ebook detailing the inner workings of the python virtual machine @ https://leanpub.com/insidethepythonvirtualmachine

smcl · on March 9, 2017

I agree - this basically gives you enough information to bootstrap your own learning about the CPython internals. I feel like all companies/projects should have a similar intro which gives new-joiners enough information that they can figure out most things themselves without too much pain, and without spoon-feeding too much.

chubot · on March 9, 2017

I feel like I should understand this but I don't: What names are looked up by name vs. by number in CPython?

That is, I think local variables and constants are looked up by a small integer which the CPython compiler produces by stack analysis.

But any globals must be looked up by name: functions, classes, modules, global variables. And methods on classes, attributes on classes.

I'd be interested to get clarity on that, and any pointers to relevant code/docs. Is this addressed in the videos? I have looked through the CPython source a lot, and even patched it, but the lookups are a little hard to follow. I've played with the "dis" module and code objects.

EDIT: Answering my own question, it seems like I was confused about the index into co_names, which is a small integer into a list of strings, and then the lookup of that string. So it's a 2-step process?

xapata · on March 9, 2017

You can find out by using dis and checking what is load_fast, load_const or load_global. Attribute lookups are always just that as far as I know. The dot operator has a bunch of paths.

chubot · on March 9, 2017

Yes thanks, I think that is right... LOAD_FAST and LOAD_CONST are by number, and used for local variables and constants.

LOAD_NAME, LOAD_ATTR, and LOAD_GLOBAL are all lookups by name, and are used for everything else: globals, object attributes and methods, modules, etc.

It seems that if Python had a static module system, all the lookups by name could be compiled down into lookups by number.

https://docs.python.org/3/library/dis.html#python-bytecode-i...

xapata · on March 9, 2017

I'm not clear on the status of PEP 509, but it could/should make LOAD_NAME and LOAD_GLOBAL approach the speed of LOAD_FAST. It'll set a flag on the globals dict (or any dict?) that trips when the dict is mutated. Non-mutated dicts can have fast repeated lookups.

https://www.python.org/dev/peps/pep-0509/

Dictionaries got some sweet upgrades for v3.6.

giis · on March 9, 2017

Here's the main page: http://pgbovine.net/cpython-internals.htm

saurabhjha · on March 9, 2017

Do you need an understanding of compilers to go through this? What are the prerequisites?

CalChris · on March 9, 2017

No. They skip the python to bytecode compiler and go straight to the interpreter and runtime. More or less. You should know C.

hermitdev · on March 9, 2017

Curious: At any point, is it explained why the Global Interpreter Lock is necessary? If so, I'll spend the time to watch.

m_mueller · on March 9, 2017

It's not necessary, it was a design choice that made sense back in the 90ies. In a multithreaded environment you can lock at a fine grained level or on a coarse grained level - or you can crash, but let's ignore that as an option. Python chose coarse grained, giving up parallel interpreter computations, but gaining a lot of thread sync overhead. All attempts so far to remove the GIL have resulted in a (usually much) slower interpreter, but the latest attempt shows some promise and it's thinkable (but not guaranteed) that in a few years there will be an official GIL-less cPython.

syllogism · on March 9, 2017

Removing Python's GIL will never make much sense. Not today and not in future. If you need CPU-fast code and would bother to multi-thread, it's much more worth it to write the code in Cython.

If your code is CPU bound and you're using native Python, you're going to be making a tonne of heap allocations and pointer dereferences. This will be very slow.

If you implement the relevant stuff in Cython, even without using multi-threading you'll likely see 10x performance improvement, and can often see up to 100x.

Removing the GIL makes Python worse at the stuff it's good at, for questionable improvements in the areas Python is really terrible. This is not a good trade.

Skunkleton · on March 9, 2017

What if you were trying to thread an non-processor bound task?

sp332 · on March 9, 2017

Do you mean waiting for I/O? It's already possible to do I/O asynchronously or with separate locks. The interpreter lock only applies to the interpreter.

hermitdev · on March 9, 2017

Do you happen to have any papers about the current efforts to remove the GIL?

I love Python, and use it a lot for ETL type work, but if threading worked well, I could/would possibly use it for far more purposes.

pfranz · on March 9, 2017

Larry Hastings - Removing Python's GIL: The Gilectomy - PyCon 2016 https://youtu.be/P3AyI_u66Bw

(I'm pretty sure this is the video I'm thinking of) It's 30m, but worth it if you're interested. Not sure what progress has been made since then.

dagw · on March 9, 2017

http://pyparallel.org is one of the more interesting experiments currently going on in the GIL area . They're basically working on removing all the practical limitations of the GIL without actually removing the entire GIL.

trentnelson · on March 9, 2017

PyParallel v1 was a nice checkpoint. I'm working on the next incarnation of it now.

dagw · on March 10, 2017

Looking forward to it. PyParallel is one of the more exciting python implementations out there

esaym · on March 9, 2017

Please remember that threads are not the only way. If you can simply break your function/routine into a smaller piece that is independent, you can easy get by with a fork. (well, unless you are on windows..)

hermitdev · on March 10, 2017

Been a since I tried to use the multiprocessing module. But, last time I did try, I ran into issues with it interacting poorly with pyodbc. It's been years, so I don't recall what the problem was, but I spent a few days trying to resolve or work around the issue with no satisfaction.

Also, most of my Python scripts run on both Linux and Windows, so I have that restriction, as well.

brianwawok · on March 9, 2017

Can you give an example where the GIL is really holding you back?

Because with multiprocessing and greenlets, 99.99% of concurrency problems are trivilially solved by current Cython.

m_mueller · on March 9, 2017

actually GP, but it has held me back in the past.

I'm writing a transpiler that uses global information from codebases, and so it transpiles potentially hundreds of files at once and creates rather complex data structures. Compute bound for quite a while, so I tried speeding it up with multiprocessing (since multithreading would be useless). But with multiprocessing it took longer to serialize/deserialize the complex datastructures for each process, so I had to give up. Next time I have time for this I'd probably try to use Jython as a drop-in replacement and see whether I can get it to run with GIL-less multithreading.

orf · on March 9, 2017

It sounds like you have a couple of hot paths and are not optimizing them. I can't tell for sure without seeing any code but nothing in your post screams out "this will be slow" or "I need parallism/concurrency". Perhaps it's the data structures you are using?

m_mueller · on March 10, 2017

I already did extensive profiling and performance improvements, at this point I'm quite sure that if I could do multithreading on my lab's 24 core Xeon Haswell machines I'd be getting a nice speedup.

xapata · on March 9, 2017

Sounds like you might be iterating dictionaries. That's much faster in Python 3.6 due to the compaction of dict storage.

mrits · on March 9, 2017

2 TB hash join

doubleunplussed · on March 9, 2017

Isn't that up to the RDBMS whether than's multithreaded or not? Unless the RDBMS is implemented in Python, CPython doesn't force extension code to be single-threaded. Just Python bytecode.

foo101 · on March 9, 2017

That's pretty much the point, isn't it? If I need true multithreading, then I am forced to write an extension in C or offload the multithreading work to another process (such as RDBMS in your case). It would have been nice if true multithreading was possible in Python itself. It would immediately make Python more useful in a variety of scenarios where splitting the work into multiple processes is not optimal or more convoluted.

doubleunplussed · on March 9, 2017

Sure, I guess I'm just used to writing my performance bottlenecks in a lower level language already, so I'm used to the GIL not actually being held most of the time in any intensive computation.

So if I want to call two Fourier transform functions at the same time in Python I can, because neither of them is implemented in Python and so they don't hold the GIL.

That's the kind of parallelism use case I most often see come up, so although the GIL dismayed me early on I've come to see it as pretty irrelevant.

But maybe it makes more sense for other applications, for the performance critical parts of the code to be actual pure Python. I do mostly numerical simulations, so pure Python is usually a non-starter, you fix that long before you think about parallelism.

mrits · on March 9, 2017

If your answer to any GIL's is to write in a different language then I guess you don't have a problem.

doubleunplussed · on March 9, 2017

It's not that I write the whole program in another language, it's that I either write the bottleneck in another language (usually Cython), or it turns out that the Python package I'm calling already has its bottlenecks written in another language, whether I wrote it or not.

Day to day, I'm writing Python code which is actually parallel because a large fraction of the run time is dominated by the by things that aren't pure Python. I suspect this is true even for people who are not going out of their way to make it true. It's simply the case that most RDBM systems, Fourier transforms, etc with Python bindings are not written in Python.

The GIL sounds scary, but I think people overestimate the fraction of time it is actually held in their code.

m_mueller · on March 9, 2017

Exactly, see my own post a few branches up.

xapata · on March 9, 2017

It still makes sense today, for the trade-off you described. Also because getting around the Gil is easy.

makmanalp · on March 9, 2017

The first 5 minutes of this covers that in general: https://www.youtube.com/watch?v=P3AyI_u66Bw

callesgg · on March 9, 2017

In the first video he stats that every language have a compiler.

A interpreted language does not need to be compiled into bytecode. Some languages are compiled to bytecode some are interpreted as is.

ciupicri · on March 9, 2017

It seems to be about Python 2. Too bad it's not about 3.

smcl · on March 9, 2017

Don't worry about Python 2.x vs 3.x here. Under the hood there's not a great deal of difference in the areas that this course covers. The "dis" module is as useful as ever, all python objects will still be of type (PyObject *), the main execution loop is still there, the concept of frames is relevant still, etc.

The lectures are very interesting and if you have a spare evening it's possible to just blast through the first 3 or 4 without sweating too much.

giis · on March 9, 2017

I watched this series more than once, it has so much details. I believe python-3 is not complete rewrite of python-2. So there must be lot of common code between them. So its useful regardless of its python-2 series or not.

masklinn · on March 9, 2017

> I believe python-3 is not complete rewrite of python-2.

Python 3 is not even remotely close to a Python 2 rewrite. Much changed UI-wise, but the core is very similar if not identical.

int_19h · on March 10, 2017

This series of blog posts is also great:

https://tech.blog.aknin.name/tag/internals/

I spent a lot of time reading it while working on the mixed-mode Python debugger for Visual Studio (which, coincidentally, supports both Python 2 and Python 3 - they do really share a lot of things). Much of that work involved parsing and writing internal Python data structures directly, since Python interpreter may be unusable when the current instruction pointer is inside native code (GIL not held, various system locks held etc).

ipnon · on March 9, 2017

Between Python 2 and Python 3, what are are the differences in CPython?

pgbovine · on March 9, 2017

that's a great question! i never did a diff of the source, but a good place to start is to diff ceval.c, which contains the main interpreter loop.

anocendi · on March 9, 2017

Dr. PG has a Youtube channel? I never knew.

This looks awesome!

canada_dry · on March 9, 2017

Kinda painful to watch... thank god for the playback speed X 1.5