Hacker News new | past | comments | ask | show | jobs | submit login
How Numba and Cython speed up Python code (rushter.com)
112 points by BerislavLopac on Aug 3, 2018 | hide | past | favorite | 45 comments



I like Cython, but I'm always surprised at how willing programmers seem to be to use the cython syntax. In so many cases I look at the optimized program and think to myself, "I would understand that better if was just written in C."


I agree, my vastly preferred option now is Python code and C code with cffi to glue them together. I find that much preferable to Cython which is basically a third language altogether (and IMO not an especially nice one, although that is a bit of a value judgement).


The huge advantage with cython compared to C is that it is trivial to call python libraries and get the same behaviour as you would in python.


Cython caters to Python programmers who don't know C well enough to be comfortable writing C (like me).


Even if you write the Cython naively (PyObjs everywhere), you get a modest speed boost simply by cutting out the Python bytecode interpreter. This is a very low friction process as opposed to carving out chunks of code that can be isolated from the Python runtime.


Agreed. I tried Cython but decided against using it. I now proof test in Python and rewrite in C.


This is probably because you don't use much of the Python ecosystem in your C module. If you're trying to speed up something that both calls and is called from Python then Cython saves a lot of verbiage and some fiddly bits.


I couldn’t disagree more. I find Cython syntax allows me to express the thing I want at the C-level with significantly less work than the equivalent syntax directly in C.

For example consider using cdef to define a simple class (which compiles directly to a struct + helper functions). Organizing it with class-like syntax is so much easier and better mapped to the concept model, and (as it should be) the compiler worries about how to map that concept down to a thin struct with functions (something the programmer should not have to consciously think about but should still benefit from).

You can go really far with this with Cython. For example this is a side project to write a “typeclass”-like pattern for polymorphism in Cython.

https://github.com/spearsem/buffersort

The function definitions hardly look different from plain Python, yet you’re getting auto-generated specialized functions for each resolvable child type of the type class (called “Ord” here), plus a dispatcher to invoke them from Python.

So you sort of get just as much type dynamism as plain Python, except each typed instance is much more performant with no PyObject or PyFunction overhead in the call sequence.


Nim is a language that mimicks python syntax but compiles down to c. Really cool and easily gives you 'fast code'. If you are into this kind of stuff I would recommend you take a look.

https://nim-lang.org/


How's the standard library in nim? What's the 3rd party package ecosystem like? If I have questions will I be able to find a good answer on Google? How about jobs?

There's a handful of languages out there you could consider "faster python". I just use cython though.


Python is awesome don't get me wrong.

I have limited knowledge about it, but mainly it isn't version 1.0 yet so the std lib stills changes and has some quirks/missing pieces or pieces that are not quite aligned yet.

The package system is Nimble, seems easy enough, fetches packages from github given a version, though I'm not sure how secure that is.

Not sure about Google as its relatively new, but in general the Nim forum is pretty active. I have asked around there a few times and got answers even from the language creators themselves.


Just use julia instead!


Julia lacks ecosystem outside of academic stuff.


It's quite complete for anything numerical/machine learning/statistical at this point, with almost 2000 packages. This is in addition to the very comprehensive standard library, which includes everything you get in Numpy/Scipy. Of course, outside of numerics, you're pretty much out of luck, which is not the case for Python.

It also has multiple dispatch like Common Lisp, as well as gradual typing. All functions are multimethods, actually. This is an enormous abstraction advantage over Python. It's much easier to do complicated things without pulling your hair out in comparison to Python.

I use it a fair amount in my job as a data scientist. It's also what I reach for if I need to write some custom algorithm myself that needs to be high performance, rather than doing it in C.


Does it have a tooling for deployment? Is there a good way to deploy a machine learning model in production?


I don't find Julia to be nearly as programmer-friendly as Python, with more syntax and more cognitive overhead. This is on an admittedly small experience base.


This. If you need to drop out of regular Python for performance reasons, then Julia offers the same high level flexibility, but with types and performance.


Cython basically lets you sprinkle types and performance into your Python code, as needed. It's not as drastic a departure as switching languages.


There's a reason most high-performance Python libraries are not written that way, and core routines are just written in C instead. Proof of the pudding is in the eating!

See this talk by Armin Ronacher (creator of Flask) on why the design of python makes it fundamentally unfriendly to performance optimizations: https://youtu.be/qCGofLIzX6g?t=171

Julia has be designed ground up to avoid several such problems. See this discussion: https://discourse.julialang.org/t/julia-motivation-why-weren...

If your domain falls under the umbrella of numerical and scientific computing, writing Julia is as painless as writing python, with code that automatically runs roughly as fast as C. If you're used to writing numpy, you can hit the ground running in Julia, with maybe a few hours to become comfortable with the slightly different syntax and the names of useful libraries.


> There's a reason most high-performance Python libraries are not written [with Cython], and core routines are just written in C instead.

Pandas, Scipy and lxml are large, very popular Python libraries that use Cython. The article even mentions them at the end.


The point is that Cython provides a nice intermediate stage between C and CPython. Most optimizations need the first factor of 100, not the last factor of 2. You can usually achieve that in Cython with an effort measured in characters changed rather than lines of code changed.

I've played with Julia. It's nice enough, but it doesn't offer me anything I don't already get through the C/Cython/CPython hierarchy.


It offers a ton that you don't get from Cython: http://www.stochasticlifestyle.com/why-numba-and-cython-are-...


I haven't used it, but nuitka (http://nuitka.net/) translates python (2.6, 2.7, 3.3 to 3.7) to C and compiles that. It claims to be highly compliant and performant without any extra pragmas.

How does is compare to the two technologies in the article?


They usually cannot improve carefully optimized numpy operations, nor GPU operations. The use cases are thus limited.


IIRC, this is false regarding NumPy. The fundamental problem NumPy has is that it only vectorizes 1 operation for each pass through the data (or occasionally 2-3, like dot product). It can't do arbitrarily hybrid operations (like idk, maybe a[i] * b[i] / (|a[i]| + |b[i]|)) in one pass through your data. This is an inherent limitation in NumPy, and the lack of it is inherent in Numba. So you very much can expect speedups in some cases -- it just depends on what operations you are performing and how big your data is.


You're right, apart from np.einsum() and numexpr which is a separate package (albeit less drastic to use than Numba, because with numexpr you don't write your own loops).


I meant "occasionally 2-3, like dot product" to include stuff like einsum. FWIW I found einsum was actually slower than tensordot last time I tried, so you may want to use that instead if this is still the case.


Yes, but it can be difficult to figure out the best way to make said carefully optimized numpy operations for anything moderately complicated. It can also be difficult or impossible to avoid extra memory allocations in numpy. Sometimes it's just quicker to bang out the obvious loop-based implementation in numba or cython.

Also, numba does in fact support targeting the GPU,̶ b̶u̶t̶ ̶I̶ ̶t̶h̶i̶n̶k̶ ̶i̶t̶ ̶r̶e̶q̶u̶i̶r̶e̶s̶ ̶a̶ ̶l̶i̶c̶e̶n̶s̶e̶ ̶f̶o̶r̶ ̶n̶u̶m̶b̶a̶p̶r̶o̶ ̶(̶i̶.̶e̶.̶ ̶n̶o̶t̶ ̶f̶r̶e̶e̶,̶ ̶t̶h̶o̶u̶g̶h̶ ̶l̶a̶s̶t̶ ̶I̶ ̶u̶s̶e̶d̶ ̶i̶t̶ ̶t̶h̶e̶y̶ ̶h̶a̶d̶ ̶f̶r̶e̶e̶ ̶l̶i̶c̶e̶n̶s̶e̶s̶ ̶f̶o̶r̶ ̶s̶t̶u̶d̶e̶n̶t̶s̶)̶.̶ (edit: it's free now, see below).


Also, numba does in fact support targeting the GPU

Numba CUDA is free with Anaconda


Thanks, I remember it used to cost money but I couldn't remember if it still did.


I guess NVidia figure they can make more selling cards than charging for the library !


Numba is not made by NVidia. It is [1] made by Anaconda (formerly Continuum Analytics), which was co-founded by Travis Oliphant, the primary author of NumPy.

[1] (primarily - it's open source)


Right, but the NVidia drivers are free for it now, whereas they didn’t used to be according to the OP.


They aren't drivers, it's just the ability to generate CUDA kernels in Numba. It has nothing to do with Nvidia supporting it, they were not involved AFAIK.


Interesting, thanks :-) either way, we’re all set to accelerate Python code on the GPU! Personally I intend to focus my efforts here rather than learning Julia.


Yeah, but sometimes it's pretty hard to avoid simple loops in numpy code. This is where these tools can drastically speed up your code.

In order to achieve good performance, all numpy code should be vectorized.


The main use I've found for numba (I'm a theoretical physics/maths undergrad), is avoiding memory allocations in tight loops. There are some cases where I've found it hard to wrangle and arrange my numpy/scipy code such that all the vectorized operations happen in-place without extraneous allocations (in my case the difficulties have been with scipy's sparse arrays, although I can't remember the exact problems).


You can use Numpy in Cython.


Second this. Cython's memory view for numpy array is great.


In particular, if you find you cannot use vectorized functions in numpy or scipy and absolutely MUST index, then typing the array in Cython is a life saver. Indexed operations on numpy arrays without Cython is very slow. (eg https://stackoverflow.com/q/22239199/300539)


Agree. It's a bit surprising to see that numpy indexed operations are even slower than the built-in list in this example. It seems the idiomatic numpy way to perform the iterations is through vectorization, but that often leads to code that is not straightforward to reason about. For this example, I'd prefer the simplicity of the Cython for-loop when it comes to optimization.


That's because with generic python code the values in a numpy array need to be boxed into python objects leading to extra memory allocations (whereas they would already be boxed in the built-in list case).


is cython compatible with all python code or is it only a subset of libraries that you can use?


It depends. This is the only deviation I've ever found: https://github.com/cython/cython/issues/1936 ; that is, Cython will execute __prepare__ in Python 2; but __prepare__ doesn't exist in Python 2, so the normal interpreter (CPython) won't execute it. This can lead to deviations; in my case, the code crashes if run under Cython, and executes fine under CPython.

The Cython maintainers disagree with me that this is a bug, so, if you're under Python 2, I would say it is "very nearly" compatible. If you're in a recent version 3, AFAICT, it just makes Python code faster.


You can import whatever you want, but it will work in "Python" mode.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: