Oh wow. That's beautifully done. Simple comments that explain clearly what the code is doing, pretty clear choice of variable names so little head-scratching going on.
The documentation of redis is really good for a large open source project. I am not a contributor, but still read the source code from time to time. Full credit to antirez for taking the time to make it easy to contribute to redis!
I agree - this basically gives you enough information to bootstrap your own learning about the CPython internals. I feel like all companies/projects should have a similar intro which gives new-joiners enough information that they can figure out most things themselves without too much pain, and without spoon-feeding too much.
I feel like I should understand this but I don't: What names are looked up by name vs. by number in CPython?
That is, I think local variables and constants are looked up by a small integer which the CPython compiler produces by stack analysis.
But any globals must be looked up by name: functions, classes, modules, global variables. And methods on classes, attributes on classes.
I'd be interested to get clarity on that, and any pointers to relevant code/docs. Is this addressed in the videos? I have looked through the CPython source a lot, and even patched it, but the lookups are a little hard to follow. I've played with the "dis" module and code objects.
EDIT: Answering my own question, it seems like I was confused about the index into co_names, which is a small integer into a list of strings, and then the lookup of that string. So it's a 2-step process?
You can find out by using dis and checking what is load_fast, load_const or load_global. Attribute lookups are always just that as far as I know. The dot operator has a bunch of paths.
I'm not clear on the status of PEP 509, but it could/should make LOAD_NAME and LOAD_GLOBAL approach the speed of LOAD_FAST. It'll set a flag on the globals dict (or any dict?) that trips when the dict is mutated. Non-mutated dicts can have fast repeated lookups.
It's not necessary, it was a design choice that made sense back in the 90ies. In a multithreaded environment you can lock at a fine grained level or on a coarse grained level - or you can crash, but let's ignore that as an option. Python chose coarse grained, giving up parallel interpreter computations, but gaining a lot of thread sync overhead. All attempts so far to remove the GIL have resulted in a (usually much) slower interpreter, but the latest attempt shows some promise and it's thinkable (but not guaranteed) that in a few years there will be an official GIL-less cPython.
Removing Python's GIL will never make much sense. Not today and not in future. If you need CPU-fast code and would bother to multi-thread, it's much more worth it to write the code in Cython.
If your code is CPU bound and you're using native Python, you're going to be making a tonne of heap allocations and pointer dereferences. This will be very slow.
If you implement the relevant stuff in Cython, even without using multi-threading you'll likely see 10x performance improvement, and can often see up to 100x.
Removing the GIL makes Python worse at the stuff it's good at, for questionable improvements in the areas Python is really terrible. This is not a good trade.
Do you mean waiting for I/O? It's already possible to do I/O asynchronously or with separate locks. The interpreter lock only applies to the interpreter.
http://pyparallel.org is one of the more interesting experiments currently going on in the GIL area . They're basically working on removing all the practical limitations of the GIL without actually removing the entire GIL.
Please remember that threads are not the only way. If you can simply break your function/routine into a smaller piece that is independent, you can easy get by with a fork. (well, unless you are on windows..)
Been a since I tried to use the multiprocessing module. But, last time I did try, I ran into issues with it interacting poorly with pyodbc. It's been years, so I don't recall what the problem was, but I spent a few days trying to resolve or work around the issue with no satisfaction.
Also, most of my Python scripts run on both Linux and Windows, so I have that restriction, as well.
I'm writing a transpiler that uses global information from codebases, and so it transpiles potentially hundreds of files at once and creates rather complex data structures. Compute bound for quite a while, so I tried speeding it up with multiprocessing (since multithreading would be useless). But with multiprocessing it took longer to serialize/deserialize the complex datastructures for each process, so I had to give up. Next time I have time for this I'd probably try to use Jython as a drop-in replacement and see whether I can get it to run with GIL-less multithreading.
It sounds like you have a couple of hot paths and are not optimizing them. I can't tell for sure without seeing any code but nothing in your post screams out "this will be slow" or "I need parallism/concurrency". Perhaps it's the data structures you are using?
I already did extensive profiling and performance improvements, at this point I'm quite sure that if I could do multithreading on my lab's 24 core Xeon Haswell machines I'd be getting a nice speedup.
Isn't that up to the RDBMS whether than's multithreaded or not? Unless the RDBMS is implemented in Python, CPython doesn't force extension code to be single-threaded. Just Python bytecode.
That's pretty much the point, isn't it? If I need true multithreading, then I am forced to write an extension in C or offload the multithreading work to another process (such as RDBMS in your case). It would have been nice if true multithreading was possible in Python itself. It would immediately make Python more useful in a variety of scenarios where splitting the work into multiple processes is not optimal or more convoluted.
Sure, I guess I'm just used to writing my performance bottlenecks in a lower level language already, so I'm used to the GIL not actually being held most of the time in any intensive computation.
So if I want to call two Fourier transform functions at the same time in Python I can, because neither of them is implemented in Python and so they don't hold the GIL.
That's the kind of parallelism use case I most often see come up, so although the GIL dismayed me early on I've come to see it as pretty irrelevant.
But maybe it makes more sense for other applications, for the performance critical parts of the code to be actual pure Python. I do mostly numerical simulations, so pure Python is usually a non-starter, you fix that long before you think about parallelism.
It's not that I write the whole program in another language, it's that I either write the bottleneck in another language (usually Cython), or it turns out that the Python package I'm calling already has its bottlenecks written in another language, whether I wrote it or not.
Day to day, I'm writing Python code which is actually parallel because a large fraction of the run time is dominated by the by things that aren't pure Python. I suspect this is true even for people who are not going out of their way to make it true. It's simply the case that most RDBM systems, Fourier transforms, etc with Python bindings are not written in Python.
The GIL sounds scary, but I think people overestimate the fraction of time it is actually held in their code.
Don't worry about Python 2.x vs 3.x here. Under the hood there's not a great deal of difference in the areas that this course covers. The "dis" module is as useful as ever, all python objects will still be of type (PyObject *), the main execution loop is still there, the concept of frames is relevant still, etc.
The lectures are very interesting and if you have a spare evening it's possible to just blast through the first 3 or 4 without sweating too much.
I watched this series more than once, it has so much details. I believe python-3 is not complete rewrite of python-2. So there must be lot of common code between them. So its useful regardless of its python-2 series or not.
I spent a lot of time reading it while working on the mixed-mode Python debugger for Visual Studio (which, coincidentally, supports both Python 2 and Python 3 - they do really share a lot of things). Much of that work involved parsing and writing internal Python data structures directly, since Python interpreter may be unusable when the current instruction pointer is inside native code (GIL not held, various system locks held etc).