Cinder: Instagram's performance-oriented fork of CPython

sergiomattei · on May 4, 2021

Yes, yes, yes! This is what I've been waiting for so long.

Python has emphasized readability for the reference implementation vs. the practical benefits of better performance for everyone, and the community is really hurting for it.

Please CPython, upstream this or something like it. There reaches a point where this whole zeal about readability becomes idealistic and hurts real-world pragmatic goals. It's hurting the language and we can do better.

Python can be ahead in the performance race. We just need to get real.

skohan · on May 5, 2021

> Python can be ahead in the performance race. We just need to get real.

I don't know if this is the case. Python's entire value proposition is that it is "executable pseudocode" with an extremely low barrier to entry - so somebody like a scientist or business analyst can solve problems without a deep understanding of computer science. Those goals are always going to run counter to performance, and python would have to re-invent itself at a foundational level to actually compete on performance with languages which were optimized for it from the ground up. Python's already winning in a lot of super relevant ways from playing to its strengths, and I don't think it makes sense to compromise those strengths to reach middle-of-the-pack performance.

If you ask me, if the python community really wanted to advance their interests, the thing to focus on would be dependency management and project encapsulation. If I could have a ux like npm or cargo for python, I would surely be tempted to use it more outside of jupyter. But it would not be because of performance ;)

throwaway894345 · on May 5, 2021

You and the parent are mistaken about the cause of Python’s performance problems. Python isn’t slow because it’s readable—it could be quite a lot faster without compromising readability. Python is slow because it exposes the entire CPython interpreter as the C-extension API, which means they can’t change much about the interpreter without breaking compatibility with some extensions (and they are unwilling to do so). Since performance is so bad, the community leans hard on C-extensions, which worsens the problem.

I suspect one of the reasons packaging is so bad is because for each node on the dependency tree, you need to download and execute setup.py in order to discover the node’s direct dependencies, and to get the transitive dependencies you must download and execute the direct dependencies’ setup.py filles. Since these files aren’t idempotent, we can’t reliably cache the results (these scripts could return different sets of dependencies based on the environment or even the current time or anything else). This at least seems to be the reason all reproducible package managers are slow (by which I mean 30+ minutes to resolve dependency versions for a small non-toy project even if the resolved dependencies at their property versions are already cache to disk).

sergiomattei · on May 5, 2021

To clear some confusion: both of you misunderstood my point.

I don't think Python's language readability is the problem.

The problem is CPython's core team shooting down any performance improvement for the sake of keeping the compiler source code readable.

hitekker · on May 5, 2021

The last two discussions I recall on HN:

https://news.ycombinator.com/item?id=24922522

https://news.ycombinator.com/item?id=11125769

throwaway894345 · on May 5, 2021

Mine is the top comment on that first link with 34 upvotes. :)

throwaway894345 · on May 5, 2021

I did indeed misunderstand. Thanks for the clarification! :)

mikepurvis · on May 5, 2021

Whl archives ship extracted, cacheable metadata, not the setup script.

throwaway894345 · on May 5, 2021

Not everything is whl though.

true_religion · on May 5, 2021

I think you and the first poster are talking about different things. They are talking about readability of the cpython compiler and you are talking about readability of the codebase.

Python has historically refused to implement many optimizations because they reduce compiler readability.

enriquto · on May 5, 2021

> Python's entire value proposition is that it is "executable pseudocode" (...) so somebody like a scientist (...)

LOL'd hard at that.

What kind of scientist would write "import numpy as np" on their pseudocode? Or multiply matrices with the "@" operator? From the point of view of a scientist, Matlab/Octave, or even Fortran, is executable pseudocode. Numerical stuff in Python seems like an ugly kludge.

throwaway287391 · on May 5, 2021

As a scientist who had to use MATLAB up until about 2013 because that's what everyone else used, it was such a relief to move to Python. It's true that you can implement a linear algebra routine in a couple fewer characters in MATLAB, but unless that's literally all you're doing, Python is much nicer to work with. The data structures in MATLAB are just a nightmare for general purpose programming, which makes things like loading and parsing data -- things that scientists often need to do -- just terrible. The fact that the notion of the "matrix" (i.e. a 2D array rather than a general ND array) is so deeply baked into everything is a huge headache (a scalar is a 1x1 matrix in MATLAB!). I'm also surprised anyone is particularly bothered by the @ operator. The asterisk for matrix multiplication seems roughly equally unheard of in nicely typeset / handwritten math (you would just write e.g. Ab for a matrix-vector multiplication with no inset operator).

throwaway894345 · on May 5, 2021

I shudder when I think of my Matlab days, but i’ve programmed long enough in enough other languages to know that we have better options than either Matlab or Numpy/Pandas with respect to API design. We can also have much better performance than Python with many other languages.

kwertzzz · on May 5, 2021

I don't know why you got downvoted, but I agree that matrix manipulation in numpy look quite foreign to to somebody new to python/numpy. I had heard this "complain" from several colleagues (mostly former matlab-users). Several colleagues have migrated to Julia, which does a quite good job in producing easy readable numerical code. For instance, the dot-product can be written with the Unicode dot symbol (typed as \dot<tab>)

enriquto · on May 5, 2021

> matrix manipulation in numpy look quite foreign to to somebody new to python/numpy

Not only to newcomers. I've been using python+numpy for more than 10 years, nearly every day, and it despairs me as much as the first day, if not more. I relish the few moments that I get to write numeric algorithms in any other language.

iamcreasy · on May 5, 2021

I think if you zoom out enough Python code looks a lot like pseudo code.

conistonwater · on May 5, 2021

> Those goals are always going to run counter to performance

These days Julia is a very clear counterexample to this sort of claim, I don't think this is true at all.

skohan · on May 5, 2021

Julia has a tradeoff w.r.t. startup performance doesn't it?

Also doens't Julia make pretty heavy use of dynamic dispatch, or am I mistaken about that?

eigenspace · on May 5, 2021

Julia's startup time is not short, but it's getting significantly shorter with every release. Right now, the startup overhead is only 0.13 seconds for me.

Some packages still take a while to load, e.g. for me to load the plotting pacakge and produce the first plot takes about 8 seconds (all subsequent plots in that session are fast). This is down from like 30 seconds last year.

One can also bundle heft packages like Plots.jl into their julia system image so they don't have to recompile all that machinery every time they restart julia.

conistonwater · on May 5, 2021

The startup performance issue is just a regular difficult issue, as far as I'm aware. I don't think it has much to do with the goal of "being readable pseudocode" or something. It's getting better with time too.

The "readable pseudocode" kind of code is exactly the sort of code in Julia that you almost always expect to be compiled down to native code quite efficiently. The kind of pseudocode I usually see is either straightforward loops iterating over something where the compiler can infer all the necessary information, or calls to library functions where somebody else has already made sure it's good. I use this a lot in my own code, and, like I said, I don't think there is a tradeoff.

akullpp · on May 5, 2021

FYI: Poetry (https://python-poetry.org) is actually close to something like npm.

skohan · on May 5, 2021

So in my mind, the main things you would need to compete with npm would be:

1. near-universal adoption

2. "just works" experience: i.e. I can clone any random git repo, and run `npm start|build|whatever` and it's always going to work, without having to know anything about my environment or fiddling with configurations.

Does poetry do that?

bassdropvroom · on May 5, 2021

Poetry wraps around pip, which does have a universal adoption. Which means you can use any package in pypi using poetry.

I don't recall it having custom commands but in my opinion this isn't the job of a package manager.

korijn · on May 5, 2021

In most npm projects, you don't depend on C extensions. For python projects that don't depend on C extensions, you already have your "just works" experience with poetry.

If you really want a "just works" experience when dealing with C extensions, you should freeze your development environment one layer up (e.g. VMs, vagrant images, what have you) so that you can always successfully install and compile your C extensions because the underlying system is also kept under version control. And this is independent of whether or not you are working on a Poetry or an NPM based project.

It's not that hard. I suspect that you're just used to simpler conditions.

hobofan · on May 5, 2021

You'll only really get 2 when 1 is fulfilled.

From my experience, with repos that have Poetry set up, I do get a "it just works" experience that I've previously been lacking when working with python. AFAIK there is no "scripts" section (for e.g. "poetry start" like "npm start") in the config file, or at least no one is using them. You'll have to take a quick peek into the README, but the same is true for npm projects, as the script verbs aren't really standardized.

RBerenguel · on May 5, 2021

You can use a "scripts" section in pyproject.toml (it's what poetry reads, and increasingly more Python tools are leveraging it, which is good), but in general each repository/project offers what it wants to offer as commands, nothing in comparison with the "standards in use" for npm and similar. Personally I have used it to give easy access to the "main thing I want to execute" for my own weird projects.

I'm pretty happy using Poetry, and agree with your initial point, partially with the second. There can be some edges with packages involving "anything" binary (to be fair, that is to be expected), and a big issue (for me) is locking the resolution of some libraries, like boto (since botocloud has hundreds of patch releases and the resolver can get pretty crazy unless you play a bit of manual bisection). But this only hits you on poetry lock or poetry add when developing, and only in some cases, I think it's a fair price to pay for a reproducible build.

dragonwriter · on May 5, 2021

> AFAIK there is no "scripts" section (for e.g. "poetry start" like "npm start") in the config file, or at least no one is using them.

There is a scripts section in pyproject.toml, which is leveraged by “poetry run”. This is different than for npm, though, since these aren’t dev-environment scripts but the executable scripts that are installed with the project

leoncvlt · on May 5, 2021

I usually put https://pypi.org/project/taskipy/ in my poetry dev dependencies to achieve this - then I can just run `poetry run task start`

anewhnaccount2 · on May 5, 2021

For the case of Python packages which are primarily exposing a command line script, alluded in the `start|build|whatever`, there is pipx https://github.com/pipxproject/pipx

tardyp · on May 5, 2021

In my experience npm just works promise brakes as soon as the repo is 1 2 years old.

carlosf · on May 5, 2021

1 - Nope

2 - Pretty much yeah

bigfudge · on May 5, 2021

Except like pipenv it is dog slow and totally inscrutable when dependencies clash.

stuaxo · on May 5, 2021

It seems a lot faster than pipenv. From the little I've tried, it seems a lot more sane - but I'm biased - I hated pipenv and never saw the point of it.

__alexs · on May 5, 2021

It used to be incredibly slow but these days I find it's not so bad. It even has parallel pip installs now which while occasionally triggering some issues in projects with over complicated setup.pys is usually a lot faster than a good old pip install -r requirements.txt.

korijn · on May 5, 2021

So have you considered the possibility that it may not be the fault of these package managers, but that there may be a different problem underneath it all that a package manager cannot fix for you?

Banana699 · on May 6, 2021

It's a very common misconception that readability is a tradeoff against performance.

The true underpinnings of readability is language malleability, nothing more and nothing less. The whole "dynamic languages are automatically more readable" impression is a misleading consequence of the much more general claim "Malleable languages are more readable", with a malleability = dynamicism substitution.

Malleability is key because readability is really just languages being as close as possible to the problems they are used to solve. This means one of two things

  1)Either the language come preequipped with the concepts and semantics of the problem domain built into their fabric, this is the DSL approach (this will always fail if you aspire to be a "general purpose language", you simply can't hope to match the sheer number of contexts people want to use your language in. One radical conclusion is to abandon the "general purpose language" myth absolutely, and just make all application development language-building, and focus on building the tools to make language building easy.),

  2)Or you make the language malleable and stretchy enough that a tiny handful of the rare programmer-domainExperts breed can construct the whole domain inside the langauge, _with_ _the_ _language_ (no transpiler, preprocessor, etc..., this would require the rare programmer-expert to be an even rarer programmer-expert-languageHacker), then cover up all the low level machinery with syntactic elements that mirror the domain vocabulary.

That's it. Any langauge that allows you to do the above is a malleable language. Any malleable language is a candidate for being a readable langauge (and a horribly unreadable langauge, if you gave in to irresponsible abstractions). Dynamic languages are malleable because they give you extremely powerful hooks into their semantics, they are very... well, dynamic. The details differ, the two poster childs are python and ruby, they have a grab bag of features that range from operator overloading in python, free form syntax in ruby, and extreme dynamic dispatch and resolution rules in both. The last feature is a common theme in all dynamic languages, and it happens to be a performance killer.

But you absolutely don't have to be dynamic to be malleable, lisps have been doing it since forever, and though lisps are traditionally dynamic, it's macros that make them stretchy, not dynamicism. And there are loads of other mechanims that can make a language malleable without making it unpredictable. It can be as simple as scala allowing unicode in it's source, and as complicated as haskell making lazy evaluation the default and now control structures are just a library. Any abstraction with a "meta" flavor, with hooks extending into the language environment and doing various things depending on how you wield them, makes the language more malleable.

Malleablity is allowing the language's source to take different shapes and semantics according to the whimes of the programmer, performance comes from massaging the source till it fits comfortably into machine semantics without much runtime shenanigans. They are absolutely not in tension, making the code do everything at runtime is just the easy way out.

rtpg · on May 5, 2021

So much stuff just from the readme would introduce breaking changes to the Python ecosystem.

Part of the reason Instagram can get away with this is they likely have almost complete control of their dependencies and the like. But the changes they decide to make would not just work for everything.

That being said I think we could get a lot of neat PEPs from this.

EDIT: this is just my opinion , but in a world where we have type annotations, JITs feel like a massive step back. Stuff like mypyc could get us way further into high performance stuff (and no black box hand waving for perf things).

czardoz · on May 5, 2021

> So much stuff just from the readme would introduce breaking changes to the Python ecosystem.

Being compatible with the rest of the Python ecosystem is the main reason why Cinder is built on top of CPython. Although yes, some features are indeed very experimental.

> in a world where we have type annotations, JITs feel like a massive step back. Stuff like mypyc could get us way further into high performance stuff

Ah, but that introduces a separate compilation step, which may not be tolerable in every situation.

_ZeD_ · on May 5, 2021

Well, the compilation step is alredy present in current py -> pyc phase, it's just a matter of "extending" it. Also, look at how cython work

czardoz · on May 5, 2021

> Well, the compilation step is alredy present in current py -> pyc phase

Yes, but developers don't have to ever interact with it.

> Also, look at how cython work

Cython works by adding a separate build step. Changing a Cython module requires you to recompile it, which is avoided with a JIT.

throwaway894345 · on May 5, 2021

Why would developers have to interact with a mypyc step any more than the pyc step? Why is “developers might have to interact with it” some kind of non-starter, as though having a compile phase is a worse evil than a hyper-slow language?

FWIW, I think we could probably buy ourselves a lot of latitude to optimize CPython by designating a much smaller API surface (like h.py) and then optimizations largely won’t have to worry about breaking compatibility with C-extensions (which seems to be the biggest reason CPython is unoptimized).

But in general I’ve lost faith in the maintainers’ leadership to drive through this kind of change (or similarly, to fix package management), so I’ve moved on to greener pastures (Go for the most part, with some Rust here and there) and everything is just so easy nowadays compared to my ~15 years as a Python developer.

czardoz · on May 5, 2021

> Why is “developers might have to interact with it” some kind of non-starter, as though having a compile phase is a worse evil than a hyper-slow language?

For big monoliths (like ours at IG), the server start-up can take more than 10sec, which is already super high for a "edit -> refresh" workflow. Introducing a Cython like compilation step is really a major drawback for every single developer.

For smaller projects, Cython works extremely well (and we do use it for places where we need to interface with C/C++).

throwaway894345 · on May 5, 2021

> For big monoliths (like ours at IG), the server start-up can take more than 10sec, which is already super high for a "edit -> refresh" workflow. Introducing a Cython like compilation step is really a major drawback for every single developer.

Then skip it for your dev builds.

czardoz · on May 5, 2021

Can you elaborate on what you mean by "skip" Cython compilation on dev builds? How would you then test changes to Cython code?

throwaway894345 · on May 5, 2021

So we weren’t talking about Cython specifically, but something Cython-like, i.e., we’re not talking about Cython’s special syntax but rather ordinary Python. This is important because it means that dev builds execute against CPython directly (i.e., your code begins executing immediately) while production builds use our hypothetical AOT compiler.

crazypython · on May 5, 2021

Python has a lot of crazy metaprogramming capabilities. Ability to inspect stacks, override import, and more...

The ones they removed are not commonly used at all.

dehrmann · on May 5, 2021

I used to work on a project the used gevent. I'd almost describe it as monkey-patched async io. It's scary-clever.

Maybe there could be a subset of Python that removes the hyper dynamic parts so it can be compiled to something faster. Maybe I'm just describing Go.

kingmaker · on May 5, 2021

> Maybe there could be a subset of Python that removes the hyper dynamic parts so it can be compiled to something faster.

What you’re describing is essentially Static Python (mentioned in the readme)

nitrogen · on May 5, 2021

Is inspecting stacks used for logging backtraces during instrumentation?

6gvONxR4sf7o · on May 5, 2021

I’ve gotten some great use out of it, personally.

nine_k · on May 5, 2021

I don't see why JIT cannot use type annotations to speed up the code.

But JIT also works on untyped or incompletely typed code.

wiml · on May 5, 2021

According to the pypy people, the type annotations also don't provide the kind of information the JIT wants. The JIT cares about things like "this integer fits into a machine word" and "this parameter can be None but is almost always not None" so on. The type system doesn't concern itself with that.

_carljm · on May 5, 2021

It’s definitely true that there’s stuff a JIT would love to know that the type system can’t tell you. But that doesn’t mean there isn’t useful information available in type annotations. In particular it is possible to speed up attribute access and method/function calls a lot when target types are statically known.

Our approach to “int fits in a machine word” in Static Python is to require machine ints to be explicitly annotated as such, and then you opt in to limited size ints with overflow etc, in exchange for getting really fast arithmetic.

rtpg · on May 5, 2021

I mean if you care about perf you can do it in one step. It’s not that huge of an ask, and engineering wise “compile some typed code” is a hell of a lot more straightforward than “try to guess what I should speculatively compile at runtime”

There was a piece by v8 devs a while back where they ended up turning down JIT aggressiveness cuz it ended up making normal web pages slower (except for like Google Docs).

This is just me backseat-BDFLing “why do the hard thing when we could do the easy thing” tho

nine_k · on May 5, 2021

Typing a large legacy codebase written in a highly dynamic language is not as easy. Instagram is pretty large.

I have seen a very large codebase in a dynamic language at [redacted] that has been mostly converted to use a sophisticated type system. Certain core things have not been converted so far, though, but just marked with a lot of "proceed with caution" and "TODO" red flags. Making them typesafe in any conceivable way would break compatibility for large subsystems.

Sometimes a rewrite is the only way out of such a situation, but very few can afford it.

huac · on May 5, 2021

Instagram wrote https://github.com/Instagram/MonkeyType to help type their Python code

kingmaker · on May 5, 2021

A couple of the folks who wrote MonkeyType (me and _carljm) also work on Cinder :)

friendzis · on May 5, 2021

> Typing a large legacy codebase written in a highly dynamic language is not as easy. > very few can afford it.

Starting a project in a highly dynamic language is a conscious choice with well-known properties. The second part is interesting, however. Essentially you are saying that gains from development velocity do not offset losses from technical debt, which is a strange observation. I would say it sort of hints at time to MVP being irrelevant business metric (startup runway/survivability aside) at best. This is contrary to typical startup truths.

nine_k · on May 5, 2021

At a certain scale, they do. Case in point: Twitter. They went through a painful transition but they did not have much choice.

But sometimes they don't. Case in point: YouTube, Instagram. Both of them are slowly migrating more and more Python code to different languages, but the amount of Python is still large, and AFAIK neither has plans to eschew it completely.

Time to MVP is a very relevant business metric. But the same thing that speeds you up in the beginning slows you down later on. If you architect your code past MVP to help replace the implementations of critical paths, it may help you down the line. (Micro)service architecture is one way to do that.

nimish · on May 5, 2021

There's no requirement that type annotations be accurate.

With hidden classes and such it likely wouldn't matter much. You'd need them anyway for full dynamism

_carljm · on May 5, 2021

With Static Python we use type annotations in compilation, and we require them to be correct (they are runtime checked at boundaries with non-Static code and throw TypeError if wrong types are passed in.)

We haven’t gone the hidden classes route so far because it’s simpler to just look at attributes assigned in __init__ and annotated on the class and lock those into slots. If you’re writing typed Python you probably don’t do a lot of tacking extra ad hoc attributes onto instances, since type checkers don’t like that either.

_carljm · on May 5, 2021

Our JIT can use type information when available to improve speed. This is what the Static Python project described in the README is all about.

Jiejeing · on May 5, 2021

> Please CPython, upstream this or something like it. There reaches a point where this whole zeal about readability becomes idealistic and hurts real-world pragmatic goals. It's hurting the language and we can do better.

I don’t know here you are picking this from. What’s hurting performance is not readability, it is the flexibility of what you can do (hence: strict modules/static python in Cinder) which means it is really hard to perform optimizations as no assumptions hold unless you turn down the more permissive language features.

Another one is backwards compatibility, and I might get some flak from people still complaining about it more than 10 years after python 3.0, but things are still 99.9% compatible with decades-old code in python 3.9, and you can’t throw that away.

Finally, the numbers of people dedicated to improving CPython performance in an upstreamable fashion is ridiculously low compared to other languages, and especially compared to the number of businesses using it. There are quite few people working on performance, but not many of them appear to be doing it in the open and with the goal of upstreaming it in a non-breaking manner (kudos to the instagram engineers, I guess).

Redoubts · on May 5, 2021

> Finally, the numbers of people dedicated to improving CPython performance in an upstreamable fashion is ridiculously low compared to other languages

Yeah because Guido and the maintainers have proclaimed multiple times that they prefer a simple implementation for teachability, over performance minded design. That’s where this entire complaint is coming from.

oblio · on May 5, 2021

This was a reasonable requirement for an academic language but it's utterly dogmatic for one of the most used languages in the world.

sergiomattei · on May 5, 2021

To clear some confusion...

Python's language readability is NOT the problem I allude to here.

The problem is CPython's core team shooting down any performance improvement for the sake of keeping the compiler source code readable.

pjmlp · on May 5, 2021

I fear this will be another unladen swallow, it is not for lack of trying, rather the resistance to change.

sergiomattei · on May 5, 2021

I honestly really wish this isn't the case again.

Python is falling behind fast. Look at how well PHP has improved in performance and how Ruby is doing lately.

This resistance is biting back.

pjmlp · on May 5, 2021

This is why I see Julia's uptake as positive, because if that isn't going to change the community aversion to these attempts and will rather keep writing C, nothing else will.

throwaway2037 · on May 5, 2021

That was the first thing I google'd when I saw this article: "unladen swallow"!

For those unfamiliar:

https://code.google.com/archive/p/unladen-swallow/

https://www.infoq.com/news/2011/03/unladen-swallow/

https://www.python.org/dev/peps/pep-3146/

sammorrowdrums · on May 5, 2021

I was doing a Hacker Rank puzzle yesterday in Python and after I optimised it, it still timed out on one input. Looked at the chat, and saw a comment "just run it with PyPy" and naturally it then passed all parts.

It's easy to forget that CPython is not that optimised generally, but as soon as you need to deal with algorithms it is lacking often.

I think even that the fact there are so many C implementations in libraries shows that the lack of optimisation for the sake of clean compiler code manifests as a readability / complexity issues elsewhere.

FartyMcFarter · on May 5, 2021

> It's easy to forget that CPython is not that optimised generally,

"Not that optimised" is actually an understatement. In my experience, CPU-heavy code typically becomes 10-100x faster if you port it from Python to a language like C (without even trying to be clever or using SIMD assembly).

stevesimmons · on May 5, 2021

Often you can put a "@numba.jit" decorator at the top of a calc-heavy function, and reclaim a good portion of C's performance.

It is so simple it is definitely worth trying.

I did it for Bloom tree implementation, and got to within around 3x of the C code it replaced.

stavros · on May 5, 2021

That would indeed be fantastic. V8 has had so much optimization whereas CPython has had very little of it, though Pypy deserves a mention here. They've been doing fantastic work, but I guess there's always the curse of the reference implementation being the popular one.

gonational · on May 4, 2021

[deleted]

jimbob45 · on May 4, 2021

Nim has long had the best of Python’s readability mixed with C’s performance and low-level constructs. I urge you to check it out.

https://nim-lang.org/

1f60c · on May 4, 2021

I think GP is talking about the readability of CPython itself.

tastyminerals2 · on May 5, 2021

I disagree, it was the Nim syntax that kept me away from the language back in the Uni when I tried it out. If anything, C syntax is barely as easy to read.

brian_herman · on May 5, 2021

I got it to work with the following but errored out on openssl so I couldn't install pip. It builds fine with docker fedora 32. Thank you. Platform Fedora 32, Docker on Windows 10, Version 2004

Steps to Reproduce Install docker from docker.com

   docker run -t -i fedora:32 bash
   git clone https://github.com/facebookincubator/cinder.git
   yum install zlib-devel openssl-devel
   ./configure
   make
   make altinstall

This error is occurs when you do ./configure --enable-optimizations this version of python complies it either way.

   >> Objects/accu.o
   Parser/listnode.c: In function ‘list1node’:
   Parser/listnode.c:66:1: error: ‘/cinder/Parser/listnode.gcda’ profile count data file not 
   found [-Werror=missing-profile]
   66 | }
      | ^

If you get this error it is because you used --enable-optimizations.

github.com/brianherman https://www.linkedin.com/in/brian-herman-092919208/

Edit: Removing github flavored markdown Edit: Answering own question Edit: More formatting

wheybags · on May 5, 2021

Pet peeve: people who put Werror in the default build flags. Leave it off by default, turn it on in your ci!

nitrogen · on May 5, 2021

For me, I'd rather get a bug report about a failure during compilation, than something more obscure and harder to find at runtime.

simfoo · on May 5, 2021

The parent proposed gating for warnings in the CI, not (by default) on the developer build.

I completely agree with this, as a dev I want to see warnings but I don't always want to have everything perfect from the start. The famous example here is -Wunused-arg. I DON'T want to fight this during development, when I often have prototyping stages where functions remain unimplemented intentionally. I very much want to avoid merging this into master though, so CI gating with -Werror it is.

wheybags · on May 5, 2021

Also "I'm trying to build your code 5 years later with a version of GCC which has a load of new warnings you'd never seen". Dealing with those is not my problem, I just want to run your tool.

TheRealPomax · on May 4, 2021

> We've made Cinder publicly available in order to facilitate conversation about potentially upstreaming some of this work to CPython and to reduce duplication of effort among people working on CPython performance.

Nice. Looking forward to seeing that happen over the next few months.

kubb · on May 5, 2021

You mean years.

TheRealPomax · on May 5, 2021

No, I don't. I'm talking about when we start seeing the changes, not when they're done.

smg · on May 4, 2021

Was Cinder influenced by hhvm (Facebook's vm for php/hack)? A project that maintains a list of different JIT implementations for programming languages and compares them would be a great way to see what are the different approaches to implementing JITs and which language features make it hard to implement performant JITs.

As an aside it is great that the Cinder team is specifically calling out that Cinder is not intended to be used outside of FB. Many people have been burned by lack of community around hhvm.

_carljm · on May 4, 2021

Definitely influenced. There are people on our team who also worked on hhvm.

chrisseaton · on May 4, 2021

> A project that maintains a list of different JIT implementations for programming languages and compares them would be a great way to see what are the different approaches to implementing JITs and which language features make it hard to implement performant JITs.

SOM for example has many implementations with different approaches to compilation http://som-st.github.io.

kungito · on May 4, 2021

I'm afraid there is very little documentation/text on modern production JITters. When I tried finding any text for my MSc I had little success. Does anyone have a suggestion about e.g. .NET 3-tier jitting or similar?

chrisseaton · on May 4, 2021

> When I tried finding any text for my MSc I had little success.

Yes you basically need to sit down with an expert to learn this stuff. It's famously under-documented and extremely hard to learn how it's done in practice on your own.

czardoz · on May 4, 2021

Here's some good documentation about v8's JIT: https://github.com/thlorenz/v8-perf/blob/master/compiler.md

Note: Never worked on v8, just liked the information here.

sanxiyn · on May 5, 2021

I wrote down "Survey of tiered compilation in JIT implementations" when I researched this: https://github.com/sanxiyn/blog/blob/master/posts/2020-01-03...

ram_rar · on May 4, 2021

There is a lot of value in leveraging existing programming languages than rewriting in something else. My team rewrote large chunks of code from python -> Go and it wasn't a pleasant experience. We were able to justify it, since it _inadvertently_ reduced our infra cost.

But, if we could make python a lot faster at compiler level, it doesn't break the dev experience and lengthen dev on-boarding time. This helps the team productivity as a whole. I hope the CPython community is able to leverage few things from Cinder. This would benefit the entire python community.

pm90 · on May 5, 2021

If you don’t mind sharing, what was it about the experience that made it unpleasant?

antpls · on May 5, 2021

Why didn't you profile the Python code and only rewrited the hot loops?

masklinn · on May 5, 2021

Not op but two reasons would be:

1. The computations are distributed enough that there is no one hot loop

2. Go is shit at integrating with other software without IPC, ipc could be infeasible or have too much overhead, it also complicates building / bundling / shipping a lot

antpls · on May 5, 2021

Thanks :

1. In distributed computing, there are usually minimal communication because nodes communication is costly, the bottleneck can still be the hot loop on each nodes.

2. When optimizing the hot loop only, it can be quickly written in Cython or C, since the scope is usually a lot smaller, so no need to use Go.

masklinn · on May 9, 2021

> In distributed computing

I didn't mean distributed computing, I meant distributed through the program. Not all programs actually have a hot loop.

> When optimizing the hot loop only, it can be quickly written in Cython or C, since the scope is usually a lot smaller, so no need to use Go.

That assumes proficiency in C, and it still complicates building / bundling / shipping a lot, the first bit of native code into a Python codebase is a huge investment and a significant step back in convenience in many ways.

adsharma · on May 4, 2021

Did you rewrite it by hand or use a transpiler?

ram_rar · on May 5, 2021

Mostly from scratch. I am yet to see a Python -> Go transpiler which generates human readable/maintainable code.

adsharma · on May 5, 2021

Try http://github.com/adsharma/py2many

tests/expected/*.go has generally human readable code.

csmpltn · on May 5, 2021

> "If a call to an async function is immediately awaited, we immediately execute the called function up to its first await. If the called function reaches a return without needing to await, we will be able to return that value directly without ever even creating a coroutine object or deferring to the event loop. This is a significant (~5%) CPU optimization in our async-heavy workload."

Step 1: "Who needs Threads? Just async/await all the things!"

Step 2: Build complex logic to untangle your nonsensical use of async/await and force things to run synchronously in the backend without the user knowing. See? A 5% CPU optimization right there!

pcwalton · on May 5, 2021

> Step 2: Build complex logic to untangle your nonsensical use of async/await and force things to run synchronously in the backend without the user knowing. See? A 5% CPU optimization right there!

Addressing your snark, I assume this optimization is targeted at places where you have a single "await" expression in the source that might call different functions at runtime, only some of which need to await.

_carljm · on May 5, 2021

Or there are just different code paths in the called function (eg a “cache hit” code path and a “cache miss” one), and only some of the code paths need to await.

toxik · on May 5, 2021

Asynchronous execution is not a replacement for parallel execution. That is not the value proposition of asynchronous execution models.

nly · on May 5, 2021

This exact optimization has been implemented in C++ coroutines via the 'initial_suspend' coroutine traits hook.

...and those are stackless coroutines under AOT compiled conditions, so it only saves a malloc.

airstrike · on May 4, 2021

Never realized Carl Meyer was at Instagram, but this explains a lot now! His work on Django has always been superb

_carljm · on May 5, 2021

Thank you, that’s very kind of you both. To be clear, this is the work of a team, and many others have contributed a lot more than I have to it.

tekknolagi · on May 4, 2021

I thoroughly enjoy working with Carl.

karlding · on May 5, 2021

Looking through the repo, one change of note that doesn't seem to be documented in the README [0] is that this has an optimization to reduce Copy on Writes in forked workers (under the Py_IMMORTAL_INSTANCES preprocessor guard) by not dirtying pages via refcount changes.

Here's the BPO issue [1] that tracks the upstreaming attempt.

[0] https://github.com/facebookincubator/cinder/blob/9de726349eb...

[1] https://bugs.python.org/issue40255

_carljm · on May 5, 2021

It’s also likely that this change, which is a big win for our prefork production environment, drags down our benchmark results. It makes every incref and decref more expensive.

kingmaker · on May 5, 2021

Good call out! I’ll make sure it ends up in the readme.

wiz21c · on May 5, 2021

FT readme :

> Cinder is not polished or documented for anyone else's use. We don't have the capacity to support Cinder as an independent open-source project

I'm quite surprised by that. I understand that FB makes tons of money so I can't believe they can't have the capacity... Maybe they don't have the will (which I perfectly understand, it's just the wording)

Anyway, it's great they share.

kingmaker · on May 5, 2021

> We don't have the capacity to support Cinder as an independent open-source project

Our team is smaller than you might think! You also left off the last half of that sentence, which is quite important: "nor any desire for it to become an alternative to CPython."

We'd love to see the ideas and implementations in Cinder end up in upstream CPython in some form or another. It'll make our lives easier in that we won't have to maintain as much forked code, and hopefully everyone else will benefit from a faster CPython.

wiz21c · on May 5, 2021

> Our team is smaller than you might think!

I thought so but the other half of my brain was screaming for "FB has so much power, how come they don't invest more in open source".

But well, that's economics at play I guess. I'm an idealist after all... Thx for replying !

nly · on May 5, 2021

Instagram were heavily using Python (and Django) when acquired by Facebook, whereas Facebook proper were a PHP shop.

The most interesting aspect really is that both companies have run in to the limits of performance with dynamically typed languages and had to implement their own language runtimes, AOT compilers, and JIT's.

Static typing is the best way forward at scale.

aeyes · on May 5, 2021

Do we have proof that they had performance problems? At least in Python you can write a C extension for a problematic code path which would probably take much less time.

I think what they are looking for is to reduce the amount of machines they need. Which in the end is more money in their pockets. Making the website faster for the end user is a welcome side effect.

zerkten · on May 13, 2021

There are articles like https://instagram-engineering.com/dismissing-python-garbage-... which seem to confirm some problems. One of the people involved with this Cinder project was an original author of https://github.com/microsoft/Pyjion (it's now being revamped by a new dev.) Dino had deep .NET/CLR implementation experience and had been the lead for IronPython which I think spawned the interest in perf improvements through Cinder. I hope this effort is more successful for him.

scrollaway · on May 5, 2021

Same, i think it's pretty awesome they're sharing something like this without feeling the obligation to make it a "proper" project.

KaiserPro · on May 5, 2021

> Maybe they don't have the will

nothing is documented in facebook. There are no advantages to doing documentation in your career.

msoad · on May 4, 2021

I love how Facebook and Instagram never went the route of "full rewrite" for their apps as they scaled.

I my experience "Language X is slow and we could save Y switching to Z" is always a false promise. You can pick parts of the system that are costing a lot and port them to other language/frameworks to capture the bulk of savings while keeping your developers happy working on familiar code. Or if you'e big enough like Facebook, you can go see why X is slow and if it is possible to improve at the compiler level. Never disrupt the developers flow (Twitter did this with their Scala craze back in the day)

spullara · on May 4, 2021

If Twitter's port to the JVM didn't result in 10x fewer servers with 10x lower latency vs an identical Ruby implementation we wouldn't have done it. Sadly not much progress has been made on the RubyVM though Twitter did try with improvements to GC and other changes. We even tried JRuby as part of the evaluation.

jashmatthews · on May 5, 2021

> Sadly not much progress has been made on the RubyVM

Lots of progress has been made on CRuby. The official optcarrot benchmark (realistic-ish interpreter bound) is ~6x faster on Ruby 2.6 than Ruby 1.8.

CRuby performance running Rails improved almost as much. Zendesk got a 2-3x improvement from 1.8 to 1.9 and 1.9 to 2.6 doubles it again.

spullara · on May 5, 2021

How does that compare to the same algorithms run on other languages these days?

tekknolagi · on May 4, 2021

Shopify is working on this! They have YJIT, led by maximecb

dehrmann · on May 5, 2021

Wasn't there a time when jruby was faster than the standard ruby implementation? On one hand, that's absurd, but on the other, a lot of effort goes into Hotspot, and you'll get those benefits for free.

lloeki · on May 5, 2021

Not really, it can turn out good from the JVM stuff for specific cases here and there, but it's mostly interesting for not having a GVL, thus enabling full parallelism for threads.

nerdponx · on May 5, 2021

Is TruffleRuby promising at all?

jashmatthews · on May 5, 2021

Enormously. The latest Railsbench results on TruffleRuby are ~3x CRuby.

spullara · on May 5, 2021

Nice. 10 years later but glad that Ruby programmers might have an option that lets them program the way they want without having to choose a different language / platform for performance.

ksec · on May 5, 2021

It is still no where near the level that Twitter would have stayed with Ruby though. Although I believe TruffleRuby being 3x faster than current CRuby is only the beginning. JVM got tons of improvement in the past 10 years as well it is uncanny.

Although Dr Chris Seaton mentioned it is not his goal to have Shopify Rails running on top of TurffleRuby. Not sure if he meant it was short term or long term. I think Dr Stefan Marr will be joining him soon, and Dr Maxime Chevalier-Boisvert is leading a team working on YJIT in Shopify. ( Did Shopify just hire all known Dr. working on Ruby performance )

spullara · on May 5, 2021

Might as well grab Charles Nutter while they are at it.

nemothekid · on May 4, 2021

I'm not sure many companies have the luxury of doing what FB did. I can't imagine a world where choosing to rewrite your app, or choosing to rewrite the entire language, where rewriting the entire language would be the cheaper solution.

Twitter did this with their Scala craze, but Twitter struggled to scale it's Ruby app. Twitter didn't have nearly as deep pockets to write a performant alternative Ruby VM.

jfoutz · on May 5, 2021

Joel Spolsky did it, city something?

chucksmash · on May 5, 2021

I believe the product was called "CityDesk" and the in-house language that compiled to several different target languages was called Wasabi.

Edit: I was wrong, Fog Creek did have a "CityDesk" product at one point, but Wasabi was for on-prem FogBugz[1].

[1]: https://www.joelonsoftware.com/2006/09/01/wasabi/

re · on May 4, 2021

> keeping your developers happy working on familiar code

That turns pretty quickly into "developers being unhappy that they have to maintain legacy code written for the old language/platform," though, especially with any sort of employee turnover.

peterhunt · on May 4, 2021

It doesn't have to be this way. If you're going to go all-in on the monolith approach like FB and Instagram did for their product code (at least when I was there), you need to have a central team whose entire job is scaling the codebase. This means actively refactoring other teams' legacy code, making sure that tests are fast, etc.

skrtskrt · on May 5, 2021

Yeah I have seen this successfully done at a smallish (200 person) company. There was a "platform" team which was essentially an "un-suck the monolith team".

Refactor slowly over time, turn a ball of mud into cleanly defined domain boundaries, parallelize the tests, etc. etc.

mgraczyk · on May 4, 2021

Idk, I worked at IG for a year and wrote a lot of backend code in the Python webserver stack. I believe they would benefit massively by aggressively migrating services to the FB Hack stack.

A ton of work goes into making sure the Python code isn't slow, and there are complicated C++ services built as workarounds for the slowness of python.

vpql · on May 5, 2021

Heresy! This is a Python thread, please summon the required positivity!

neya · on May 4, 2021

I disagree, this logic doesn't work always unless you have investor money lying around. For a bootstrapped business, it could mean living or dying. As an example, a client of mine was paying close to $5000 per month in server costs simply because of the scale of traffic they had on their site. By re-writing Wordpress in Elixir, I was able to bring their costs down to $1000 odd per month. It's actually cheaper than what any of their competitors are paying for, in servers as well.

This $4000 in savings actually allowed them to hire an additional full time developer to maintain their site. So, your logic only works for certain use cases, not all.

aprdm · on May 4, 2021

Saying that you disagree and saying at the same time that you rewrote it in Elixir makes me think that wasn't a good use of the money. They will have problems hiring for it and it isn't even performant compared to other languages you could have written into if you really wanted to save money (i.e: golang).

Of course I don't have all the context, but, I cannot imagine any circumstance when rewriting some server side logic in elixir would be useful from a PoV of cost savings..

neya · on May 4, 2021

How do you know it wasn't a good use of the money? The client's site has been up for a year since then without any issues for 1/4th of the cost. Their investment in this stack wasn't much. Hiring wasn't a BIG problem, but it did take time. We used that savings to find a dev to maintain the site.

By RAW numbers, GoLang or something else maybe faster, for sure, but it will definitely extend the time to develop something like Wordpress because its frameworks I've tried (Eg. Gorilla) need more effort compared to using something like Phoenix that I use. Plus, a lot of devs were already onto Ruby, it's easier to onboard them into Elixir than GoLang.

> Of course I don't have all the context, but, I cannot imagine any circumstance when rewriting some server side logic in elixir would be useful from a PoV of cost savings..

I think you may not have worked with a high traffic site, just by swapping languages alone you get insane cost savings for that scale because it quickly adds up. Even if your new language is just 0.1 second faster per user, for a million users, it adds up really fast. As I mentioned, just by this switch alone we brought down the cost to 1/4th.

I believe GoLang or something else may have drastically reduced further, but it has the same downsides of Elixir (Harder to find talent for, learning curve). I'm a pure Elixir consultant, so I chose it. My client is more than happy with this outcome.

aprdm · on May 4, 2021

Yes, I kind of agree with everything you said actually! Apart from Elixir from a cost savings on infrastructure PoV, if that's how you wanted to frame it (since the context of the HN submission is about changing languages for better performance or not, and Elixir not being particularly fast).

leesalminen · on May 4, 2021

How much did you charge to rewrite WordPress in elixir though?

neya · on May 4, 2021

I'm obviously not going to reveal actual numbers, but it's safe to they spent less than what they would have spent being on Wordpress for the remainder of the year. Time to completion is 3 months. No new "feature" development was blocked. Remember - this is a content website (Wordpress is mostly used for content AFAIK and less for building features).

The moment it was done, we just switched nameservers. It had no impact on client's workflow or day to day life.

weird-eye-issue · on May 4, 2021

If it was a content website on Wordpress, couldn't Cloudflare's $5/mo APO have done the job? It static caches the site at the edge. You'd get almost a 100% hit rate unless there are strange requirements.

edit: Adding link with info about Cloudflare's APO product https://support.cloudflare.com/hc/en-us/articles/36004982231...

pastage · on May 5, 2021

APO is not a drop in to make what better, there are alot of kinks, and you could spend as much in dev time to get that working right. YMMV.

weird-eye-issue · on May 5, 2021

It can be a drop in (it was for my sites), but it depends on the requirements

neya · on May 5, 2021

Good suggestion, but if your content site is a news site, then homepage and certain category pages require by the second updates. Cloudflare unfortunately can't help and all the requests are directly hitting your database at that point. That forces you to get an expensive MySQL instance to sustain that kind of traffic.

weird-eye-issue · on May 5, 2021

Cloudflare will automatically purge the cache when you update or publish an article. Also from my understanding of APO, the cache will get distributed to all their edge locations without each one needing to hit the origin server. So if you update an article it would only get fetched a few times max until it gets updated again. That seems pretty good. Also as far as MySQL, you could install a Redis cache in front. Personally we use Varnish and Memcached as additional layers in front of Cloudflare (it works out of the box with Cloudways)

sim_card_map · on May 5, 2021

yes it could, but it's not webscale!

gotta use aws

weird-eye-issue · on May 5, 2021

Well to be fair he is an Elixir consultant so he rewrote it in Elixir.

code-is-code · on May 4, 2021

And how big was the risk to have the invested time wasted? How long did you block new feature developement?

neya · on May 4, 2021

Safe to say they would've gone out of business if they kept burning that kind of money for the remainder of the year. This was around covid era, they didn't have a solid revenue stream from the traffic they got.

throwawaaarrgh · on May 4, 2021

How much of that traffic could have been avoided by a CDN, a caching plugin, Varnish, and php-fpm?

neya · on May 5, 2021

You can't cache everything on a news website, especially the homepage. The homepage requires constant updates. For rest of the pages, I agree and I do use caching.

byroot · on May 5, 2021

Even a one second cache would cap your homepage to 1 req/s which can be cheaply and trivially handled by any stack.

dannyw · on May 5, 2021

Every single large news website is cached.

zerkten · on May 13, 2021

This is some of the original thinking around StackOverflow, if I remember correctly from the podcasts. Jeff was a C# dev and using what was familiar, but he also commented on how you could get good performance out of some crappy code and easily refactor later.

Their whole culture elevated performance and they handled a lot of traffic with relatively small amounts of hardware that they documented. I've got some familiarity with a Rails app where the devs seemed astonished at some of the details in their blogs about running SO.

joe_the_user · on May 4, 2021

One factor for Twitter was that Ruby was simply orders of magnitude worse for Twitter usage than I think Php or Python was for Facebook or instagram. Or at least one never heard of these providing horror stories where Twitter definitely had horror stories. In one of the few times I've been "close to the source", I heard Twitter's head of operations complaining the electricity had become their primary expense from starting X many Ruby instances per second. It was one instance per connection or something close, because Ruby was unstable and because Ruby never gave memory back to the system.

flakiness · on May 4, 2021

Yeah, it kind of reminded me the "python strict mode" they talked about before [1]. No designed to be cool but to be useful.

[1] https://instagram-engineering.com/python-at-scale-strict-mod...

bjoli · on May 5, 2021

This is something that people have known for a long time: a lot of money has been spent on making compilers make fast code. For many intents and purposes we know what kind of reasoning we want to be able to do to produce fast code. Strict modules is probably one of the sinples yet efficient optimizations you can do: being able to reason about where and what something is without having to do an expensive lookup every time it is used is brain dead simple to understand.

Yet many of the languages that evolved as "scripting languages" threw much of that knowledge out. In the middle of the 90s we knew how to make fast dynamic languages. Maybe not as crazy dynamic as something like ruby, but not far off. Or at least, they had the sense to codify it as a separate thing (CLOS in common lisp comes to mind).

Something like SBCL is more dynamic than you need(TM), yet produces code that is often at least an order of magnitude faster than python.

czardoz · on May 4, 2021

Yes, Static Python especially relies heavily on strict modules, since they enable us to perform module-local analysis, which enables some cool optimizations.

cageface · on May 4, 2021

"Language X is slow and we could save Y switching to Z" is always a false promise.

The thing that keeps software interesting for me is that there are almost no absolute rules. So I'll agree that complete rewrites are usually a mistake and performance problems are often not in the language.

But I can't agree that rewrites are always a mistake. esbuild is a recent good example of how much difference switching to a faster language can make.

halfmatthalfcat · on May 4, 2021

Did Twitter actually contribute to Scala core though? They definitely created and contributed to various Scala libraries but they didn't fork/augment the Scala compiler like Facebook did with HipHop.

spullara · on May 4, 2021

They did contribute to the JVM but directly, not by forking it.

pjmlp · on May 5, 2021

And are nowadays one of the reference users of GraalVM in production.

spullara · on May 5, 2021

Oh right! I forgot they moved to that.

drunkenmagician · on May 5, 2021

Not sure I can agree with that. Sinking resources in to re-engineering a language infrastructure, mostly because the language / infra is not actually a good choice for the target problem is an approach only very large engineering teams can tackle. As has been already been started (further down the thread). Breaking down your code / monolith into manageable components and re targeting for performance in the key spots is almost always the right approach. Unless of course you have the engineering resources and hubris of someone like FB

guenthert · on May 6, 2021

Rewriting in full for the sake of performance gain is indeed a questionable proposal. Rewriting hot spots being the more sensible approach.

A better reason for rewriting however is to migrate to a more maintainable language with focus on readability from one which focus on expressiveness (and hence favored for rapid development) as the project matures.

kevingadd · on May 4, 2021

I think while performance is not a proper justification, if you get other stuff in the bargain it can be really worth it. Rust, for example - the justification to switch is usually not just speed but safety and the safety enabling more optimizations, parallelism etc. If all you care about is speed you stay in C/C++.

Sometimes the choice is also made when you already have to pay a transitional cost - at a previous employer we had to rewrite a bunch of code to move from an old version of PHP to a newer one, and a strong argument was made that at that point we should just adopt a better language since we already had to pay that cost. I think eventually a lot of stuff transitioned to Haskell as part of that process, even if other stuff stayed in PHP.

IceWreck · on May 5, 2021

Benchmarks: https://github.com/facebookincubator/cinder/blob/cinder/3.8/...

There are some perf hits, as well as all the improvements.

_carljm · on May 5, 2021

It’s worth noting that we’ve never targeted perf on any of these benchmarks, so these are just the results that fell out from optimizing for our production workload, which is a totally different beast.

Most of the hits are due to the default “JIT everything” behavior which is really bad if there are a lot of rarely called functions, but would be easy to fix. This is discussed in the README.

rkimb · on May 5, 2021

How does this compare to other interpreters?

AbuAssar · on May 5, 2021

We need some benchmarks to validate these bold claims.

Edit: found them at https://github.com/facebookincubator/cinder/blob/cinder/3.8/...

brian_herman · on May 5, 2021

How to get it working with Docker on Windows 10. https://gist.github.com/brianherman/63b65cdd92675f4a83cec1c4...

joelthelion · on May 5, 2021

This is cool, but at their scale I wonder if they wouldn't be better off rewriting most of their codebase, or at least a performance-critical subset, in something like Rust or Go.

It may seem like a lot of work, but making Python faster isn't exactly trivial either.

Also, I love Python, but using it for huge projects like this sounds dubious at best. I think the benefits of static typing really show when projects grow large.

deadmutex · on May 4, 2021

I wonder if they are in touch with kmod or tried pyston: https://blog.pyston.org/.

_carljm · on May 4, 2021

Haven't tried Pyston, its revival as an active project happened well after Cinder was in production.

kingmaker · on May 5, 2021

We’ve chatted with kmod a few times and let him know we were open sourcing Cinder. Hopefully the projects can learn from each other. As _carljm mentioned, Pyston was restarted way after Cinder was in production and we had already implemented a significant amount of the core JIT functionality.

pjfin123 · on May 4, 2021

What's the difference between this and PyPy? They're both JIT/performance oriented Python interpreters?

Good to see more work on this front! It's seemed crazy that Python can be this popular but still not be nearly as fast as it could be.

amethyst · on May 4, 2021

Cinder is a fork of CPython, and maintains compatibility with C extensions written for CPython. Which includes a lot of important/popular third-party packages these days.

jarpineh · on May 5, 2021

I wonder if Instagram being big Django site has driven Cinder development? What sort of impact Cinder has on their Django workloads?

Regular models, views, ORM stuff can be fairly generic. Did they need to change Django to better benefit from Cinder?

Edit: intriguing… https://github.com/facebookincubator/cinder/blob/f60897df9f6...

minusf · on May 5, 2021

this podcast goes more into the details about django at instagram. https://djangochat.com/episodes/django-instagram-carl-meyer-...

tldr: very old version with tons of custom stuff.

pjmlp · on May 5, 2021

Regarding the Static Python subset, Microsoft also has one flavour of it.

https://makecode.com/python

_carljm · on May 5, 2021

Static Python is a bit different in that it’s not a subset language. It makes a few semantic changes, most notably that classes are slotified, so you can’t tack random extra attributes into instances. If you’re writing typed Python you probably don’t do much of this anyway since all type checkers will complain about it. But the approach is “normal dynamic Python with optimizations where allowed by types,” not subset.

rlamy · on May 5, 2021

How similar is it to RPython?

czardoz · on May 5, 2021

Unlike RPython, Static Python in cinder is not really a subset of Python, it can compile everything (although it will throw compile time errors if it sees mismatched types). If it cannot determine type information, it just assumes the type could be anything, and falls back to slower CPython behavior.

mangecoeur · on May 5, 2021

In many ways nice that they open their performance work, some of which could be upstreamed.

In some ways a bit cheeky that they took the 'dump it and see' approach rather than offering to work to upstream it, since it's kinda outsourcing the work of maintaining the performance improvements to the Python core devs rather than offering to put some of Facebook's considerable resources towards doing it themselves.

terminalcommand · on May 5, 2021

What more could we expect from them to do? They are a for-profit company. Opensourcing their entire interpreter stack with years of proprietary development is a bold move. It shows that they care about open source and want to give back to the community.

They could've kept these developments behind closed doors and nobody would mind it.

They also offered to provide support if anyone wanted to contact them. That is top-class behaviour from instagram, I personally couldn't expect more.

black_puppydog · on May 5, 2021

Not saying what they did was not good; it's certainly better than keeping things closed, but to answer these two parts:

> What more could we expect from them to do?

> That is top-class behaviour from instagram

To be considered top-class, I would expect them to go the extra mile and contribute to upstream. As many other companies have done over the past and continue to do.

_carljm · on May 5, 2021

We have contributed quite a few parts of cinder upstream already and will continue to do so. I’m not sure where you got the idea that this is just a dump and we expect core devs to do all the work of upstreaming.

black_puppydog · on May 5, 2021

I'm very happy to hear that _carljm. I was just replying to my GP's "they took the 'dump it and see' approach" and my Parent's "What more could we expect from them to do?"

If you're going that mile, then my hat is off to you and the team, that's how it's supposed to be done! :)

_carljm · on May 5, 2021

"Open it and see if there is interest" is the first step in "offer to work to upstream it." Not sure where you got the idea that it's only the former and not the latter.

BiteCode_dev · on May 5, 2021

>Is this supported?

>Short answer: no.

That's refreshingly honest

boomer918 · on May 4, 2021

This is awesome, a lot of knowledge shared in that code, and lots to learn from I'm sure.

timkofu · on May 6, 2021

> Python can be ahead in the performance race. We just need to get real.

Completely agree, and Instagram did it, and they have been reaping the benefits for it. Now they've open-sourced the runtime, so we can all benefit now too.

pjfin123 · on May 4, 2021

For anyone curious about the detailed history of Python interpreters https://www.youtube.com/watch?v=NdJ9BxgRpOY