Hacker News new | past | comments | ask | show | jobs | submit login
Pyston v2.2: faster and open source (pyston.org)
272 points by chenzhekl on May 6, 2021 | hide | past | favorite | 100 comments



https://github.com/pyston/pyston

> Pyston is a fork of CPython with additional optimizations for performance. It is targeted at large real-world applications such as web serving, delivering up to a 30% speedup with no development work required.

I did not really found any information on what type of additional optimizations are done. Is there some documentation which goes into that?


Pyston originally started as an internal project at Dropbox and they had a couple of super interesting blogposts where they detailed their approach and findings[0]. After a a year or two, the project got axed, but the devs kept working on it and then released it on their own (I don't remember the details, but evidently the chosen license for the project allowed that).

Last year (I think it was last year, but can't tell if that was 2020 or 2019) they released it on their own, thinking of a commercial model. Now they have switched to OSS+services, which is interesting because it'llopen up more people using Pyston.

I tihnk this is interesting in its own right, but yesterday we got Facebook announcing Cinder[1] -- which is unsupported to outsiders, but significantly faster than Pyston. According to Facebook, they intend for parts of Cinder to be upstreamed (and have already upstreamed a few, apparently) to CPython. So again, and sorry for using such a vague word, it'll be interesting to see how all these things play out in time.

[0] https://blog.pyston.org/2014/09/

[1] https://github.com/facebookincubator/cinder


As someone with a fairly minimal understanding of the different open source licenses, I was a little confused on how you could fork Python to a commercial project, but it turns out it's permitted under the license [0]:

> Note GPL-compatible doesn’t mean that we’re distributing Python under the GPL. All Python licenses, unlike the GPL, let you distribute a modified version without making your changes open source. The GPL-compatible licenses make it possible to combine Python with other software that is released under the GPL; the others don’t.

[0]https://docs.python.org/3/license.html


this is the difference between 'strong' copyleft licenses like the GPL and 'weak' permissive licenses like the MIT, pythons, or BSD


Good question, we talked about it a tiny bit in our previous blog post and are planning on doing more in-depth blog posts in the future. I added a very brief summary to the github readme:

- A very-low-overhead JIT using DynASM

- Quickening

- Aggressive attribute caching

- General CPython optimizations

- Build process improvements


I heard about Pyston in the python podcast

https://www.pythonpodcast.com/pyston-fast-python-interpreter...



I'm still learning about Cinder, but my initial impression is that there are many similar techniques used, and maybe one interesting difference is that with Pyston we've focused more on out-of-the-box performance while Cinder has had an opportunity to co-optimize the runtime with a particular codebase and presumably gets better results if you're able to do that.


I don't understand, can't those changes be merged upstream?

Edit: Also yesterday's HN discussion on Cinder https://news.ycombinator.com/item?id=27043217


Or better yet, why can't the two projects be combined into one?

Wouldn't it be better to have two companies working on it instead of just one? Or are the implementation details too far apart?


Wondering same, as it shows full compatibility to cpython unlike Facebook implementation


My understanding is that can is only half the question. Yes, the changes CAN be upstreamed but my understanding (through others, so take it with a grain of salt), is that the CPython doesn’t like to take optimization changes that obscure the CPython code base. As a result, they often deny merging optimizations in favor of readability in the C codebase. Again, just my understanding and may not be 100% accurate.


Cinder is fully compatible with the CPython's C-API


They are alternative solutions to the same problem.


Are they not compatible solutions? I understand if the development is very fast you want to have focus, but the goal here is performance, it seems worth to at least coordinate. I hope it happens!!


There are many ways to get to the same place. It's worth a shot to experiment with different approaches, they also don't have the same pros and cons.

I'm sure at this level of tech they are well aware of the prior art.


How fast is that thing at HTTP serving? For example Python v3.x ThreadedHTTPServer can do 1,000 requests per second for hello world so 30% faster isn't much faster. On the other hand, redbean can do 1,000,000 requests per second on the same machine using a dynamic Lua handler. https://justine.lol/redbean/index.html redbean was also the top #3 show hn thread of all time. https://news.ycombinator.com/item?id=26271117 Lua isn't that much different from old Python. I'm also considering integrating js and scheme into redbean too, so people can have options in terms of personal aesthetics and programming taste when they achieve that million qps. For example, Fabrice Bellard's JS engine has already been checked-in to Cosmopolitan and it actually benchmarks faster at ECMAScript test262 than even v8 if I remember correctly.


Context matters. If you have an existing python app which is CPU bound, 30% faster means ~third of your web infra cost can go away. That may be very significant.

Also unless you have a very trivial app / use only your own code, Lua web environment isn't really comparable to what exists in the python ecosystem. Not yet anyway.


We're talking about Python 2.2 here which is literally twenty years old. You can't leverage what exists in the Python ecosystem using a version that old. Many libraries don't even permit using 2.7 anymore. I remember Python 2.2. It had so many things missing compared to what we grew accustomed to with Python 2.7's long lifetime. Going that far back with Python is a hair's width away from starting from first principles.


It's Pyston 2.2 not Python 2.2. this Pyston version is based on python 3.8.8. See https://github.com/pyston/pyston/blob/f71f76ff3874970d71ed4c...


Oh wow I stand corrected. I saw Py..on 2.2 in and my head for some reason I thought it was a mad science experiment to gain a performance edge by resurrecting an old version without the bloat, because Python 2.2 is something I remember very well. If Pyston is really fully committed to going their own way with a fork, I'd recommend they just go for the gold and bump up the version to 4.x, since that sends a pretty clear message to the Python project about how valuable they feel the enhancements they've made are, while avoiding confusion. Python performance is truly something we all want to see improve so maybe things will change.


It's Pyston 2.2, not Python 2.2


As a huge Python fan, I think now is the time for the leadership to start thinking about Python 4 (especially since the 2->3 migration headache seems to have mostly gone away).

I say this because v4 would be a good time to look at putting some of these speed improvements into the mainline code - something that would be largely backward compatible, but make those breaking changes that give big speedups (such as putting limits on the C interfaces and other big building blocks).

My gut (and really, it's just a half-assed guess) tells me that Python could easily see a 2x speedup in exchange for some of those incompatibilities, and an even bigger gain depending on how deep the changes went. [0]

If that's anywhere near correct, then doing a v3/v4 in parallel (like v2/v3 was) would probably be the only way to go, so that people would know there is a migration path. [1] [2]

[0] - There are really only two ways to do this - keep breaking things and introducing incompatibilities along the way, or make a bigger break for the sake of performance. Of course, doing nothing is also an answer, but probably one that we wouldn't really be happy with. There is also a variant of versioning where we do 4/5/6/... in relatively rapid succession, such as 2-3 year intervals, and make a fewer big breaks for each release, and that might even work better.

[1] - I suffered through the v2/v3 issues also, (as an end user more than a dev though), and it wasn't fun. But I'd do it all again if it meant a big jump in speed.

[2] - I know the perl team mucked this up (hindsight being 20/20), so having a working V4 on first release (not counting alphas & betas) would be key. Scoping that out well beforehand would probably be needed, but I think the team is broad and deep enough to do that.


Given how troublesome the 2->3 migration has been for large companies I wonder if a breaking 3->4 wouldn’t kill the language. I know of more than one big shop that still has a lot of Python 2 in house but has disallowed the creation of new Python code bases in favor of other languages.


I've thought this over, and my conclusion seems to be that people will take a reasonable amount of stress if they get a big performance improvement.

I absolutely agree that for some people, it will be too much, and frankly, I don't think that problem is solvable.


Well, it's solvable in the sense that breaking language compatibility is a choice that doesn't have to be made, I guess. The 2->3 transition was managed as effectively as such a big thing could be but it's for left the industry littered with a bunch of weird self-maintained python2 forks, compatibility layers for the C side of python. A 2x theoretical performance improvement doesn't pay down the engineering costs to make a switch if you recoup it in lower hardware costs later.


>hat still has a lot of Python 2 in house but has disallowed the creation of new Python code bases in favor of other languages.

That seems silly. It's not like other languages don't have their own major version migration pains.


What is the definition of a "big shop" here?


50k person years of work represented in the code bases and up, as an approximation. Big enough that to do the work you'll have to coordinate across multiple departments with different incentives for multiple years to get the transition made.


I'd say 2x probably isn't worth causing major ecosystem pain. Maybe if it opened up potential for 10-100x speedup for some fairly common workloads or improved developer velocity.


I agree. Servers are cheap compared to people.

If we had to endure another large python migration, we would just migrate to a different language with better perf.


I wonder whether it would be better to just go with the nuclear option and kill the C extension model as it currently stands.

It would take probably decades for libraries to catch up, and maybe kill the language, but if it survived, Python would be so much more attractive for many kinds of projects.

Conversely, Python has a lot of niches, and works well for them. Incremental improvements are probably fine, even with the dreaded GIL.


What would be gained by killing the C extension model? Making it easier for PyPy/GraalPython or the more recent Pyston and Cinder to outperform CPython?

HPy is in the process of creating a new C extension API that doesn't expose CPython internals and is easier for alternative Python implementations to support with high performance.


Kudos for open-sourcing it. Here’s to hoping some of the changes make their way into upstream CPython.

What’s the compatibility story like? Is there a list somewhere of unsupported features?


Full compatibility but the C-extensions must be rebuild for some libraries like PyTorch.



From 6 months back (v2.0): https://news.ycombinator.com/item?id=24921790 (262 comments)


Looks like the complete list is (well, a more complete list anyway):

Pyston 2.1 Is Blowing Past Python 3.8/3.9 Performance - https://news.ycombinator.com/item?id=25895346 - Jan 2021 (12 comments)

Pyston v2: Faster Python - https://news.ycombinator.com/item?id=24921790 - Oct 2020 (206 comments)

Personal thoughts about Pyston's outcome - https://news.ycombinator.com/item?id=13680580 - Feb 2017 (67 comments)

Pyston 0.6.1 released, and future plans - https://news.ycombinator.com/item?id=13534992 - Jan 2017 (23 comments)

Baseline JIT and inline caches - https://news.ycombinator.com/item?id=12010244 - June 2016 (12 comments)

Pyston Python JIT Talk - https://news.ycombinator.com/item?id=11528159 - April 2016 (11 comments)

Caching object code - https://news.ycombinator.com/item?id=9887756 - July 2015 (5 comments)

Pyston 0.3: Self-hosting Sufficiency - https://news.ycombinator.com/item?id=9103596 - Feb 2015 (54 comments)

An open-source Python implementation using JIT techniques - https://news.ycombinator.com/item?id=7529862 - April 2014 (27 comments)

Announcing Pyston: an upcoming, JIT-based Python implementation - https://news.ycombinator.com/item?id=7524712 - April 2014 (41 comments)


For people wondering if the version number means something about compatibility with Python 3, it does not: they claim compatibility with Python 3.8.8


Thanks! I was looking for this in their site. Should be in their homepage and FAQ !!! (without py3.7+ I wouldn't even consider trying).


There was a different similar post a few days ago from Facebook and this one stems from Dropbox and they both are said to make Python code run faster.

I think the obsession with monocultures is unhealthy.

Python is a great language. It is reasonably easy to learn, somewhat readable if you dont yet. It is one of the most popular programming languages on the planet. But it was not created to produce the fastest code possible.

There are other programming languages that are focused on speed.

Perhaps for the most critical parts use one of them. Python can integrate pretty well with C with some magic i hear- (Never tried it myself).

One programming language will never cover all use cases and they should not have to.

Every programing langue is not made to be "functional" but they can be tortured into it.

Every programming language is not object oriented, but they can be tortured into it.

Every programming langue is not focused on being the fastest on execution but they can be tortured into it.

There is a reason carpenters have more than one tool to build something.

For the longest time I programmed in C with some C++ mixed into it. It is a great language. There are many problem domains I would not recommend using C.

I think it was delightful ot pickup Python many years ago.

It is great. I can be very productive in Python in contexts were Python is great.

It is not C, it should not want to be.

Now I am picking up Elixir, I am learning a lot. It is neither Python or C and I am happy with that.

use proper tool for the job at hand

e to https://github.com/facebookincubator/cinder reply


Suppose you are staring a startup. You know and like python so you build your MVP in python. 1k to 10k lines later you hire a few extra engineers to help bring your MVP to a real product. 100k to 1mill lines later you become very very successful and you now have 10s to 100s of engineers coding every day. They're all producing the same number of LOC you were. Maybe more now. Then you take a look at your bills and find that what used to run on a single box on your basement is now costing > 100k/month in compute.

So what do you do? Rewrite everything in C++? Well you can start but you have so much software it's hard to do. Maybe you can save a few percent of compute cycles across all of your code? Reduce the 100k/month to 80k/month.

Obviously this number is fall smaller than it would be in reality.


You do what you generally do with performance challenges: you profile and improve the hot-spots. Rewrite those parts if necessary.

Rewriting everything for performance reasons is silly. If you decide to rewrite everything, it would be for other reasons (e.g. maintainability, long-term viability, etc.)


What is the advantage over Nuitka, another Python re-implementation?

https://nuitka.net/


Nuitka is a static compiler, transpiling python code to C, and binding it to cpython. pyston is an interpretter, but with a JIT.


How is that an advantage? A C compiler is needed in either case, either for using together with Nuitka or for compiling Pyston in the first place, right?


How does this play with Cython?

I'd assume most performance-sensitive projects will have compiled their hotspot modules with Cython. So how does Pyston work alongside Cython?


With Cython you can get C level performance. The cost is that your Python code has to be written just as you would write C.

See e.g. https://github.com/thomasahle/fast_pq/blob/main/_fast_pq.pyx... for how much fast Cython code can look like C.


It should in theory. Cython generates modules using cpython API. As long as Pyston is compatible with existing C extensions (and it should be), there's not much difference between them and cythonized py source.


To give comparative numbers, in my experience Cython yields 900% speedup.


I've found this, that says that python is slow because it doesn't have JIT:

https://stackoverflow.com/questions/3033329/why-are-python-p...

I'm curious to understand, why is it so hard to write a JIT runtime for python?


Because much of the power of python comes from its libraries, many of which depend on C(++) extensions tied to details of CPython interpreter. It's very hard to write a JIT without breaking this.

For example, all the numerical/scientific stack (numpy etc) is extremely fast by being written C, Fortran, etc with a Python interface. If you break this in a JIT then the plain python might be faster but actual useful code will be much slower.

Additionally, one of the reasons C-extensions are so fast is because the CPython implementation is very clean which has made it easier to write stable and performant extensions.


> I'm curious to understand, why is it so hard to write a JIT runtime for python?

First, the CPython reference implementation is supposed to be kept simple. Second, it exposes interpreter implementation details to extension writers, meaning alternative implementations with a radically different architecture might be unable to use some parts of the ecosystem or have to implement costly workarounds (see eg the PyPy FAQ [1]).

[1] https://doc.pypy.org/en/latest/faq.html#do-c-extension-modul...


https://github.com/microsoft/Pyjion was a really nice attempt to marry CPython with JIT in a rather generic way, but it was abandoned quickly.


Development of Pyjion continues here:

https://github.com/tonybaloney/Pyjion


Interestingly Ruby decided to write a JIT despite exposing about as much of the interpreter to extension writers. So far their JIT is not very usegul for most real life workloads but I think that is mostly due to lack of development resources rather than any actual limitations.


AFAIK there is Pypy that is much faster. Also, there is Numba. But the problem with Pypy is (was?) interoperability with existing C modules.


A comment on another HN thread mentioned that the reason for CPython still being non-JIT was the developers wanted the interpreter codebase to remain simple and easy to understand.

This sounds like the only reasonable explanation to me. JIT with C interoperability is hard, but it's not that hard.


It's certainly hard to write a JIT for Python, but Cinder, Pyston and Pypy have done it.

You'd need to look at the CPython project (main Python project) why they don't want to include anything like that. I hope there will be steps taken.


There are several

The most well known is PyPy

Pyston appears to be another with a different approach


Its confusing as to which implementation of Python is appropriate for certain use case or all are just competing with each other ?

Someone from python community should write a blogpost about features, merits and limitations of CPython, PyPy, Cinder, GraalVM and Pyston


Cpython and pypy are pretty mature and battle tested. You can find a lot of information on them.

Cinder and Python are pretty new, it would be premature to say too much, we don't have enough background information about them.

GraalVM is between the 2: it's been there for some time, but it's still not much known. I don't know anyone using it in prod. I would be curious about a writing on this one.


I'd say graal python is even less mature than Pyston.

> At this point, the Python runtime is made available for experimentation and curious end-users. (https://github.com/oracle/graalpython/tree/master/docs/user)

> Does module/package XYZ work on GraalVM's Python runtime? - It depends, but is currently unlikely. (https://github.com/oracle/graalpython/blob/master/docs/user/...)


To a first approximation, everybody uses CPython, the reference interpreter. Of the other implementations, only PyPy is really production-ready.


Given it was created by / for Instagram, I’d say it’s likely that Cinder is (machine count wise) seeing far more production use than even Pypy, but perhaps you mean production ready for users other than its creator?

(E.g. hhvm at Facebook vs everyone else who tried to use it after it was first publicly released)


I think GraalVM is production ready too, but the JVM is not very popular with the python crowd and vice versa.


GraalPython describes itself as “an early-stage experimental implementation of Python”.

https://github.com/oracle/graalpython


I stand corrected. GraalVM seems mature, but apparently the Python support is not.


What happened to Jython?


The changeover to Python 3 killed it AFAIK. Same for IronPython.


Last commit on Iron Python 3: 44 minutes ago (https://github.com/IronLanguages/ironpython3)

Jython 3 has a roadmap with 3.8 has a target: https://www.jython.org/jython-3-mvp.html

Not ready, but not cancelled.


Hard to argue this wasn't precipitated by Instagram's Cinder - nice to see it smoking others out already.


Pyston has been around for way longer, and even its post-Dropbox relaunch (v2) predates Cinder’s open sourcing by a lot, and has seen steady development. The only thing you can argue is they cut the v2.2 release in response to Cinder, which doesn’t really mean much.


yeah that's what I mean - that Cinder precipitated the open sourcing of Pyston.


See response from Cinder team on the thread:https://news.ycombinator.com/item?id=27047097


Does that target Python v2 or v3?

Edit: It's Python v3.8.8, thanks to https://news.ycombinator.com/item?id=27059804


Python v2 is dead for over a year. Most packages dropped support for it, so I wouldn't expect anything new released for v2.


I just looked at the diffs, the amount of changes seem relatively smaller when looked at commits 024d8058..HEAD

They should totally upstream this back to python/cpython.


What's the difference between this and pypy?


Pyston is a fork of CPython and aims to maintain drop-in compatibility. PyPy is a completely separate implementation which is not fully compatible with CPython.


> not fully compatible

It's really quite close, and https://github.com/hpyproject/hpy is intended to make them even closer.


It's super-promising and actually looks pretty ergonomic in comparison, but it'll take rewrites of existing C API leveraging packages.

There's not really a way around that, since they have to interact with refcounting now vs a more abstract mechanism (eg. hpy's handles) that shield from the underlying GC mechanism.

My fear here is that this will be a cool project a few projects really looking for cross-interpreter python support leverage, but will otherwise be a non-issue.

It doesn't need to show performance improvement on PyPy or Pyston as much as <=0 performance degradation of CPython, which will stay everyone's primary target.

I guess when it shows to be mature, it'll just be a question for each library of whether the work is worth supporting alternative interpreters for their user-base or personal lives or job.


Pyston claims 30% performance improvement over cpython while maintaining compatibility, pypy claims it's 4.2x faster but has some compatibility issues irrc.


Pypy's claim comes from a set of common Python benchmarks. Pyston claim comes from company's webserver benchmarks. These may be comparable, but may be also apples-to-oranges.


It's pretty good in terms of compatibility these days. Conda-forge has builds of most packages available for both cPython and PyPy on macOS/Linux/aarch64/ppc64le: https://conda-forge.org/status/#pypy


We use completely separate benchmarks, where our Pyston benchmarks are quite a bit "harder" and PyPy is only 12% faster on them while using 4x the memory.


AFAIK it doesn't play well with popular (read: omnipresence) libraries like Numpy?


I think this is incorrect. There is even an older blog post (2016) where Pyston explicitly benchmarks Numpy: https://blog.pyston.org/2016/11/11/pyston-0-6-released/


We run the numpy testsuite as part of our testsuite so to our knowledge it works well.


The slowest part of python is imports. Did they fix that? :)


Imports are faster! Maybe not as much as you're hoping but we did optimize them a bit.


Nice! The alternative i'm considering is mypyc, is it faster than that?


is it GIL-free? Web servers are multi-core. Any fork of python should serve requests on all cores like golang and its scheduler does using goroutines.


You can easily do that by simply running N instances and distributing requests among them. No need for all the complication that would come with removing the GIL.


And pay in memory for that luxury: when using lots of complex libraries (like in ML) the Python process can eat quite a lot of RAM.


Since gc.freeze() got implemented (https://docs.python.org/3/library/gc.html#gc.freeze) having forked workers has much lower cost these days. As long as you can load your data pre-fork that is.


Wow, I missed its introduction. Nice. On a cursory glance Celery doesn't make use of it yet, unfortunately.


Yes. Python is not without tradeoffs.


Is this a drop in replacement for cPython?


> Working Pyston into your projects should be as easy as replacing “python” with “pyston.” If that’s not the case, we’d love to hear about it on our GitHub issues

So yes, it should be.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: