Using the Cython Compiler to write fast Python code

jgalvez · on Oct 29, 2010

Is it just me or this does sound like an extremely elegant stack in the making? No VMs, just plain native Unix-speaking software.

Prediction: lots of emerging startups who are basing their code on Python today are going to resort to this (or PyPy, but that's unrelated) when the scaling pain begins, or simply to attempt making things run a bit faster. I think this is great. Most people I know avoid C completely because of the hidden pitfalls every novice has to go through, but maybe Cython will slowly change that. It just needs a bit more maturity and some endorsements to pop up around here... which shouldn't take too long.

rdtsc · on Oct 29, 2010

We have already done this but wrote plain C extensions. Actually C libraries with a clean interface, then wrapped them in Python extensions as well.

With C and Python you get to play on both ends of the spectrum -- concise clear code, and high performance code with C.

mace · on Oct 29, 2010

Pure C extensions will give you the best performance. If you write really modular C code, interop with it and Python is really clean.

Cython is interesting, but as cited, there are also some limitations and caveats. See http://docs.cython.org/src/userguide/limitations.html and http://docs.cython.org/src/tutorial/caveats.html

swolchok · on Oct 29, 2010

Cython is not a bad story for C interop, though. See e.g. pyevent (http://code.google.com/p/pyevent/), which provides Python bindings to libevent. Be warned: until there's a new release of pyevent (0.4), you should, in my opinion, use the SVN version.

mturmon · on Oct 29, 2010

I very much agree with this. Especially if you use similar C library interfacing styles, you can reuse or abstract out the glue parts easily.

About playing on both sides of the efficiency/expressiveness spectrum, the important thing is to be sure to do real benchmarks so you only drop down into C when it pays off.

Sometimes, the reasons are not just efficiency -- you may already have C code that does the right thing.

stavros · on Oct 29, 2010

The only problem is that, for the vast majority of companies, CPU time isn't the major bottleneck. I don't think there are many companies who do so advanced algorithmic stuff that it takes hours to complete them, unless you're doing some ML (or other) processing in the background.

In that case, however, you should very definitely (as in, I'm not even considering an alternative) go with Python/Cython, as you will develop many times faster and your program will have near-C speed.

My current favorite is ShedSkin, which compiles your (unmodified) program written in a subset of Python (not a particularly limiting subset, mind you) into C++ which you can then compile (and link as a module, if you like).

My experiences with both:

http://www.korokithakis.net/node/117

http://www.korokithakis.net/node/109

Erwin · on Oct 29, 2010

I've written 2-3 kLOC of both boost::python and Pyrex (what Cython is based on) over many years to greatly speed up critical parts of my code, and I'm sticking with the Cython way from now on. Boost::Python uses complex template meta programming to allow you access to Python objects in C++; you're mostly writing C++. I like how I can see what I'm getting with Cython as opposed to trying to fathom the template metamagic.

Back when I used Pyrex my main problem was accidentally using some variable or function that was not C declared and Pyrex silently generating code for that that called into Python, requiring a review of the generated C code -- I think Cython has fixed that so you can see when it turns something like foo(x) into Py_GetTheGlobalVar("foo") etc.

Anyway, that's a view from a production environment which has used Pyrex for 3-4 years with great success.

ashika · on Oct 29, 2010

I recently refactored some underperforming production code using Cython. After spending ~6 hours rewriting 2 modules in Cython's "python superset" syntax, I'm left with a 33% performance boost, and virtually no additional project complexity.

I've also been replacing some rather excessive struct.unpack usage in my code with Cython's C struct pointer casting syntax, and uncovering _massive_ performance gains. 45 seconds of parsing now takes 3 seconds.

I'm pretty much convinced there's no reason to learn CPython's C API, given Cython's maturity and PyPy's improbable, scintillating ascendancy. viz. RPython may be Python's performance future, but Cython is ready now.

angusgr · on Oct 29, 2010

virtually no additional project complexity

Was there an increased complexity within the two modules you rewrote?

Do you have an estimate on how long it would have taken you to reimplement the two modules (or their critical components) completely in C/C++?

(Qs intended out of curiosity, to help quantify the benefits)

ashika · on Oct 29, 2010

No additional complexity was introduced into those two modules, aside from being a different language that's easily grasped by anyone who knows Python and C. LOC-wise, the modules were about the same as the Python versions.

I've done some integration of C code using ctypes, which works quite well, and offers the obvious speed boost, but feels less coherent and ultimately less maintainable, project-wise, than a well-coded Cython module. Writing a full-on CPython module from scratch would probably offer better performance than Cython if you know the quirks and are disciplined. But to someone who doesn't already drip CPython C modules, Cython is a godsend.

Ultimately, there's 5 commonly used ways (CPython, Boost::Python, SWIG, Cython, ctypes) to integrate C into Python, and right now you'd be crazy not to give Cython a shot, if that's your need. It's very easy to learn for anyone familiar with both C and Python.

wkornewald · on Oct 29, 2010

Then you're lucky because when I optimized an algorithm for my diploma thesis the code grew significantly because my Python functions suddenly had to deal with different type combinations and also, as part of the optimization, I had to switch from my short numpy-based code to manual for-loops which combined several operations that were simple (but less efficient) numpy arithmetic before. I think it highly depends on what you want to do, whether you can still work with Python objects, and how much static typing gets in the way when trying to solve your particular problem.

ashika · on Oct 29, 2010

was the performance gain worth it in the end? In my experience numpy is pretty tight on its own, but I've seen some excellent speed gains from using cython + numpy.

wkornewald · on Oct 29, 2010

Yeah, I got the runtime down from 40min to 10min. Then I implemented a Python->CUDA compiler and got it down to 30sec. :)

angusgr · on Oct 30, 2010

I implemented a Python->CUDA compiler

Wow, interesting. Is it released somewhere? Googling found me the Copperhead project[1], was that what you used?

I'm not sure if "implemented" means you implemented your code for it, or you implemented the entire compiler. :)

[1] http://code.google.com/p/copperhead/

wkornewald · on Oct 30, 2010

No, it's a custom implementation of a simple compiler. It's nothing complicated. It converts Python to C++ and compiles that with nvcc. It also supports numpy arrays. It doesn't do any complex optimization steps like a full compiler. It's more like Cython, actually (with type annotations via an @gpu decorator). This allowed me to take my Python image processing code almost literally and annotate it with @gpu. The code isn't released, yet.

I originally wanted use Copperhead and got in contact with the developer a year ago, but it was too early even for "private" beta testing, so I never got access to their code. Also, my compiler is specialized on image processing, so probably Copperhead wouldn't have worked, anyway. I'm only jealous of Copperhead's type inferencer. :) But then again, I have to get finished with my thesis and a type inferencer wouldn't help with that goal. ;)

angusgr · on Oct 30, 2010

Interesting, thanks for explaining. :)

fgimenez · on Oct 29, 2010

Here's a relevant article that convinced me how awesome Cython is:

http://www.perrygeo.net/wordpress/?p=116

Basically, if you are doing any work that requires heavy numerical processing, Cython is the way to go. On the other hand, I was playing with it to do some basic text processing and the improvements were negligible.

dagw · on Oct 29, 2010

Don't forget numpy. I managed to get his code from 2 seconds to 0.2 seconds by a simple 60 second rewrite using numpy and arrays.

xtacy · on Oct 29, 2010

Even for numerical processing, it depends on where/how you write your loops. numpy just offers bindings to optimised C libraries and loops using map(..,..) should be much faster than using for.

As far as text processing is concerned, it seems like the python code is just a nice interface to the underlying compiled library and hence there isn't much difference.

ramidarigaz · on Oct 29, 2010

This is the kind of stuff that makes me run around my living room with excitement. I can most definitely make use of this.

ntoshev · on Oct 29, 2010

No generator support is a big limitation to me: my CPU intensive code uses generators. An alternative to watch is Shed Skin:

http://code.google.com/p/shedskin/

sqrt17 · on Oct 29, 2010

ShedSkin is great if you don't want Python interoperability. Remember the excitement when they announced that Jython now runs Django? Cython allows you to run Django plus some optimized code, whereas there's currently no convenient way to make ShedSkin and CPython talk to each other. (And it's also the reason why hopes on PyPy are so high: A specializing JIT would allow you to combine all the flexibility and monkeypatching from CPython with the speed you normally only get from a compiler).

stavros · on Oct 29, 2010

As far as I know, ShedSkin executables can be compiled as modules and imported in CPython painlessly, so the above is not accurate.

ntoshev · on Oct 29, 2010

Yup, from the ShedSkin tutorial (http://shedskin.googlecode.com/files/shedskin-tutorial-0.5.h...):

"Shed Skin is an experimental Python-to-C++ compiler designed to speed up the execution of computation-intensive Python programs. It converts programs written in a static subset of Python to C++. The C++ code can be compiled to executable code, which can be run either as a standalone program or as an extension module easily imported and used in a regular Python program."

stavros · on Oct 29, 2010

Plus the maintainers is awesome, I was having suboptimal performance and he made adjustments within a few hours many times, making ShedSkin (and my program) much faster each time.

levigross · on Oct 29, 2010

There is a fear that many startups have in using Python that when they do grow they will have to move large parts of their code into C due to speed limitations of the CPython interpreter

I feel that PyPy and Cython alleviate that fear.

hartror · on Oct 29, 2010

Fear?

The ability to transition parts of one's code base to C/C++/ASM through CPython's excellent library makes using python from day one such an easy decision. It is even easier as there is a library ecosystem that makes the transition even easier.

stavros · on Oct 29, 2010

He probably made a slip because of the "fear" down the line. I think he meant to type "hope".

bryanh · on Oct 29, 2010

In the same vein as Cython if you just wanna speed up your Python code, take a look at Psyco as well. It's a "just-in-time (JIT) compiler" which means I have no clue how it works, but it does.

jbarham · on Oct 29, 2010

Unfortunately Psyco is 32-bit only and not being actively developed (http://psyco.sourceforge.net/psycoguide/req.html). On the plus side, the author is now focusing on PyPy.

angusgr · on Oct 29, 2010

I was interested to find some discussion of how the performance characteristics compare to PyPy/Unladen Swallow/RPython.

Quick googling only found one comparison: http://jaredforsyth.com/blog/2010/jul/21/cpython-vs-pypy-vs-...

The routines he benchmarked are simple string tokenizing, though, so I'm not really surprised the translated-to-C version came in so much faster.

hurfadurf · on Oct 29, 2010

Cython is also an amazing way to write Python bindings. CPython is too heavy and complex, ctypes is too annoying and verbose to write, and Cython is a staggeringly perfect DSL for walking the line between C and Python.

jbarham · on Oct 29, 2010

Somewhat OT, but as a longtime Python programmer I've recently been enjoying playing w/ Go (http://golang.org/) which combines the lightweight syntax of dynamic languages like Python w/ static type safety and efficiency in a single language.

And it's easy to wrap C libraries in Go. See http://code.google.com/p/gosqlite/source/browse/sqlite/sqlit... for an example of a Go Sqlite binding in < 300 lines.

gsivil · on Oct 29, 2010

The way that you display the webpage (like a power point presentation) looks impressive and clear. How is this done?

cabalamat · on Oct 29, 2010

I disagree. I dislike the way the web page is displayed, because it breaks important conventions on how web pages should work. Instead, there should have been forward and back buttons to navigate between the individual slides.

bmelton · on Oct 29, 2010

I liked it generally, until I wanted to go 'Back'. Almost every slide where they displayed a code comparison on separate slides meant that I had no idea what was done differently.

Implement some navigation, and it's perfect.

drv · on Oct 29, 2010

There is a navbar that appears when hovering near the bottom of the page (or just use the arrow keys).

carpdiem · on Oct 29, 2010

I spent some time building numerical simulations as the theorist on an R&D team in Los Angeles, and Cython was a lifesaver. Iirc, I got ~150x performance increase (note, 150x, not 150%) out of it. That gave me all the great high-level bits of python (and ease of writing code!) with the speed of C.

FraaJad · on Oct 29, 2010

Are there any widely used Python packages that can use the performance boost offered by Cython.

I'm unable to come up with a project of my own which can use Cython, but I'm curious enough to try it on an open source project if it benefits developers.

baltcode · on Oct 29, 2010

I used this a long time ago, and from my observations, the biggest gains occurred in loops. (They are the only time Python is really slow anyway.) However, the last time I checked interfacing with numpy and scipy was not supported.

cdavid · on Oct 29, 2010

Then you checked a very long time ago :) It is quite easy to pass numpy arrays to cython functions. I would say that except in some very special cases, writing raw Python C interface is useless (of course, writing numerical code which is independent of the C API is still very useful). Cython also gives you the benefit of py3x compatibility (cython generates wrappers which can be compiled for both python 2 and python 3).

rdtsc · on Oct 29, 2010

> I would say that except in some very special cases, writing raw Python C interface is useless ...

I would disagree. Cython is less known and has a learning curve. CPython extension interface is well established and more common. That is not an insignificant advantage.

Do you stop for a couple weeks to learn Cython, and do you have complete confidence in its generated code, or do you just start using something you know and is tried and true? It depends. We chose C Python extensions or just writing hotspots in C in a separate process.

cdavid · on Oct 29, 2010

The C API also has a learning curve, and is certainly harder than cython. You don't need a couple of weeks to learn cython - I rewrote one small package from ctypes to cython in ~ 1 day, without previous experience in cython.

The problem of C python extensions is that it is so hard to write correctly because of reference counting.

another · on Oct 29, 2010

I'm not sure of your needs, but there's now reasonable support for efficient interaction with numpy arrays; see, eg

http://docs.cython.org/src/tutorial/numpy.html

and/or

http://conference.scipy.org/proceedings/SciPy2009/paper_2/fu...

Some types of array addressing will take the slower---but still correct---invocation path through Python, but the common form is compiled to direct (albeit strided) array access.

baltcode · on Oct 29, 2010

Thanks, that looks great! I used to do stuff like that with weave. It would be interesting to see the speed comparison for weave and cython.

stavros · on Oct 29, 2010

I look forward to Cython supporting annotations in 3, making python effectively optionally statically typed (three adverbs in a row, top that)!

wushupork · on Oct 29, 2010

totally off topic but I read this as Cylon compiler