Optimizing Python – A Case Study

vosper · on May 29, 2015

He's missing Cython, which is another good option when you're looking for speed.

My personal favourite optimisation, from needing to shave a few milliseconds off our API response times, was discovering that it's measurably slower to use * args and * *kwargs, and switching to explicitly declaring and passing arguments in the relevant parts of the code.

We also did a few other neat things:

- Rolled our own UUID-like generator in pure Python (I was surprised this helped, but the profiler doesn't lie)

- Switched to working directly with WebOb Request and Response objects rather than using a framework

- Used a background thread with a single slot queue to make sure our response was returned to the user before we emitted the event log message, but always emit the message before moving to the next request

- Heavy optimisation of memcache / redis reads and writes

Edit: Fixed formatting

jMyles · on May 29, 2015

- Used a background thread with a single slot queue to make sure our response was returned to the user before we emitted the event log message, but always emit the message before moving to the next request

The crosstown_traffic API in hendrix does exactly this.

https://github.com/hangarunderground/hendrix

vosper · on May 29, 2015

Hey, that's kinda cool - thanks for the link!

jMyles · on May 29, 2015

No doubt. I'm loving hendrix (and, thus, twisted). It has given me renewed enthusiasm for python as a networking language.

Come check out #hendrix on freenode.

sitkack · on May 29, 2015

The order of tactics to take is wrong. In terms of energy expended, one should use PyPy first! It is amazingly compatible with CPython and can now be embedded directly in CPython programs, https://github.com/fijal/jitpy (supports numpy arrays)

Dump your virtualenv, create a new one with pypy, reinstall libraries and test your app. Takes less than 20 minutes, even for complex applications.

lqdc13 · on May 29, 2015

First thing you should do is optimize data structures IMO.

This is the advantage Python has over lower level languages - easy way to implement complicated things.

Kind of like Linus's quote: "Bad programmers worry about the code. Good programmers worry about data structures and their relationships."

sitkack · on May 29, 2015

I would say it is a close second thing. If I have a slow system, I will move to faster runtime before modifying any code. Going from CPython to PyPy, if possible will almost always gain you enough perf increase while you refactor the slow parts.

fnord123 · on May 29, 2015

It depends on what you're doing. I think your strategy can often be high risk since you are making a change that affects every part of your program at once -changing the runtime on a mature Python project is not a trivial choice! Changing the relevant code to tidy up data structures, or moving a few function calls to Cython or C, however, will only touch the obvious places and so it has less impact on the program.

Also, profiling pypy is less straightforward than profiling CPython code since the hotspot changes the runtime characteristics of the program. This means you need to run tests many times over to make sure the code warms up. This makes further optimizations slightly more difficult. It's not a problem for people with experience optimizing Python code, but for people who actually hope to learn something from OP's blog post, it might be a sticking point.

In my experience in using Python for stats, script type work (as opposed to writing servers and daemons), pypy just isn't that useful. All the Python code is doing is gluing numpy and Cython code together and pypy isn't likely to be able to warm up in time to beat it - and it won't beat it since it's spending most of its time in C.

Obviously, if pypy is an ideal choice for you, use it. But I don't think your experience should really be put forward as a general approach.

sitkack · on May 29, 2015

I have found PyPy wins if the wall time is over about 1.5 seconds, making it my goto Python environment for 1GB+ ETLs.

Everything you say is true, but under the systems I use/write I am able to test for correctness pretty quickly. The nice thing about switching from CPython to PyPy is that everything get faster. I have also found that using PyPy has removed lots of cases where I would want to drop down to native code.

Changing platforms can make one's designs simpler and more robust. When it comes to structured storage, I'll start with sqlite, then when it starts to get slow I'll switch to PostgreSQL. It takes almost no work to port from one to the other.

You really should give PyPy another shot. It supports more of numpy every day and the startup time is excellent. Maybe give jitpy a try if you are not likely to move off of CPython.

ma2rten · on May 29, 2015

Except if you are using python 3 or numpy or any other library written in C.

_6wkj · on May 29, 2015

A lot (but definitely not all) of Numpy is actually Pypy compatible: http://buildbot.pypy.org/numpy-status/latest.html

riquito · on May 29, 2015

If you know that you can't us PyPy you can remove it from the list, otherwise sitkack has a point.

sitkack · on May 29, 2015

I have never used it, but PyPy3 is CPython3.2.5 compatible.

ryan_sb · on May 29, 2015

Tbh, the order in that list wasn't meant to be sorted by difficulty, but I can go back and swap that around.

clickok · on May 29, 2015

Serious question: if you have some code that really has to be fast, is it viable to keep it in Python, or should you ultimately end up rewriting it in a compiled language?

For example, I am writing code that implements networks that evolve over time for AI research. Prototyping it in Python makes it easy to test things out, but I expect that I will have to rewrite it in C++ or maybe something more fun, like Haskell[1].

1. Mostly for the sheer joy of trolling my colleagues with a learning agent monad.

thezilch · on May 29, 2015

You might "just" migrate some slower parts to numpy or write a C target and interface with it over Python's CFFI.

eikenberry · on May 29, 2015

+1 numpy could probably handle this

ma2rten · on May 29, 2015

It depends on your use case.

In my experience the reason why you wrote something in python (implement features faster) remain valid reasons later on when you want to add functionality.

By using pypy, cpython, rewriting small parts in C/C++ and/or using the libraries which are written in C (such as numpy) you can normally make the hotspots in your code fast enough, while keeping the advantages of python.

adrianN · on May 29, 2015

In my experience it is worthwhile to first try and improve the performance in Python. It's easier to play around with different ideas in Python, and things like numpy quite often enable you to get "fast enough". Only if that is not enough should you consider writing a C extension.

ryan_sb · on May 29, 2015

Python makes it easy to do lots of things -- including very inefficient ones. Profiling your existing stuff and trying to optimize in pure Python often gets you pretty far.

dec0dedab0de · on May 29, 2015

I've never had to do this, but the common advice I hear is that the right algorithm in a high level language can be faster then the wrong algorithm in a low level language. So even if you do end up going the route of writing it in something lower level, it is likely worth optimizing your code at the higher level first where it is less expensive to try different things.

p1esk · on May 29, 2015

Same question to you: if you have some code that really has to be fast, is it viable to keep it in C++, or should you ultimately end up rewriting it for GPU?

wallstop · on May 29, 2015

Is this really a suitable counter-question? GPUs lend themselvse towards specific kinds of programming problems + added latency of GPU communication. On the contrary, the cost between switching from any on-CPU language is programmer time that may result in significant runtime advantages.

vetinari · on May 29, 2015

Also disadvantages: increased programmer time for implementing new features. Risks: you might optimize the wrong part (i.e. you invest into rewrite, but it will not solve performance problems).

That's why you must quantify advantages and disadvantages, including risks minimization and only then you'll see, whether given course of action is viable.

jermy · on May 29, 2015

You should probably first consider seeing if there are critical bits of code that can be rewritten to use MMX/SSE instructions, since your data is in the right cache already, without needing to move it anywhere else.

pepijndevos · on May 29, 2015

> Think for a second: time is only ever going to increase

Well, most of the time at least. Think about DST and leap seconds.

ryan_sb · on May 29, 2015

True, but in this particular case, the cost to time going (temporarily) backwards would only be connecting to a potentially-suboptimal disque node, and that would be remedied after the subjective-machine-time caught up with the previous subjective time.

mangeletti · on May 29, 2015

Isn't Jython not a JIT compiler, and isn't Jython much slower than cPython?

coldtea · on May 29, 2015

Nope and Nope. Jython is made to run on the HotSpot which is a JIT compiler, and Jython should be comparable to speed to cPython and faster in some cases (used to be slower, but that was 3-4 years ago, the optimized it a lot, and added stuff in Java 7/8 helped too).

JRuby is faster than standard C Ruby too.

DasIch · on May 29, 2015

There is a huge difference between an interpreter with a JIT compiler and an interpreter running on another interpreter that has one. These are not equivalent at all.

coldtea · on May 29, 2015

Jython is not running on an interpreter.

It IS an interpret itself, and it runs on a virtual machine that takes care of JIT compilation.

paulfurtado · on May 29, 2015

In the JRuby case, JRuby compiles Ruby to JVM bytecode.

Jython may do the same: rather than create cpython bytecode it may produce JVM bytecode which may be optimized by the JVM. However, I do not know anything about Jython performance and they could not be employing the same tactics as JRuby.

With modern Java features like lambdas coming into java 7 and 8 and other interesting languages like Scala, Groovy, etc being written for the JVM I'm sure things have come a long way since the time jython 2.4 was being developed on Java 5/6 and I'm sure the JVM has many more optimizations that dynamic languages may benefit from.

vorg · on May 29, 2015

> other interesting languages like Scala, Groovy, etc being written for the JVM

I think Clojure is one of the most interesting so I'm keen you include it in your minimal list of examples. I don't think Clojure actually uses the Java 7 "invoke dynamic" bytecode though.

ehaliewicz2 · on May 29, 2015

Jython compiles to JVM bytecode actually. That bytecode consists of a lot of calls to methods implemented in java though.

DasIch · on May 29, 2015

The difference between a virtual machine and an interpreter being?

coldtea · on May 29, 2015

Mostly orthogonal. You can have an interpreter without a virtual machine (Basic or Python doesn't have one, just a runtime), and similarly a virtual machine without an interpreter.

An interpreter executes scripting instructions directly.

A virtual machine implements a faux (virtualized) cpu, with its instruction set etc, and executes its "assembly code" (with or without JITing).

(Things get complicated in that you can also have combinations of those concepts).

ihm · on May 29, 2015

Python in CPython is executed after being compiled to the lower-level Python bytecode. Is this not sufficient to consider CPython to involve a virtual machine?

DasIch · on May 29, 2015

Almost all interpreters compile an ast to an "assembly code" that gets executed. CPython even executes that assembly code without seeing the source at all just like Java.

fnord123 · on May 29, 2015

Do you have a source on the benchmark results suggesting that Jython is comparable to CPython speeds?

makmanalp · on May 29, 2015

If you run Jython on the hotspot JVM, that'd count as JIT. As for the speed, I'm not sure.

wtetzner · on May 29, 2015

Well, it runs on the JVM, so depending on which JVM you use, it might use a JIT compiler.

aburan28 · on May 29, 2015

Pythran is also missing

accounthere · on May 29, 2015

As is Nuitka, but who keeps track of these things?