Faster Python calls in Cython 0.21

raymondh · on Aug 16, 2014

Nice work Stefan :-)

For people who care about performance, Cython Numba and PyPy are a vital part of the Python ecosystem. They let you create highly performant code while retaining Python's clarity, learnability, and rapid development capabilities.

These gains don't come easily; instead, they are the result of years of thinking carefully about what the machine actually does internally and coming up with more direct paths to accomplish the same goal.

Thank you for your work.

rch · on Aug 16, 2014

Seconded. I was really surprised to see that his proposed Cython talk didn't make it into PyCon. Maybe SciPy would be more receptive?

raymondh · on Aug 17, 2014

This has been an issue in Python talk selection for several years. One year the PyPy folks didn't have a single accepted talk despite having heroic accomplishments that will greatly affect Python's future. And last year, scientific and numeric talks almost non-existent despite the booming growth of PyData community and the wide adoption of Pandas.

See https://twitter.com/dabeaz/status/413287588426809344

Part of the reason is that there has been an effort to get new people on stage regardless of experience level and to get substantially more women on-stage as well.

See https://us.pycon.org/2015/speaking/cfp/ and https://twitter.com/tarek_ziade/status/455409302912921600

The overall net effect has been positive, making the community more inclusive and giving more stage time to new, fresh talent. The downside is that there is less room for other players (for example, none of the proposed talks from Continuum were accepted).

ris · on Aug 16, 2014

I was actually considering the other day that I was surprised Cython (and numba?) don't do something where they use a copy of the libpython source to allow them to inline calls back into python-land. Yes, fraught with packaging/distribution difficulties, but possibly worth it for situations where the speedup is needed.

stuaxo · on Aug 16, 2014

This is cool, how hard would it be for the cpython core to implement similar optimisations ?

ris · on Aug 16, 2014

Er. Essentially impossible because the inlining only works at situations where you know a whole bunch of things at compile-time. CPython knows essentially nothing at compile-time.

fijal · on Aug 17, 2014

It's not impossible. It's "easily" done if you have a JIT, but even without a JIT, you can inline the call on the bytecode level using similar techniques - you have to be able to rebuild the chain if someone asks for it. One can (easily) argue that the complexity is unnecessary and the speedups are unclear. It really depends how hard you try :-)

PyPy is generally achieving speedups mostly by:

* avoiding allocations by escape analysis

* avoiding escape through frames by removing frames

* avoiding another level of allocations by inlining calls (and avoiding frames)