An interpreter with a JIT is obviously faster than one without. Especially when ...

masklinn · on Feb 9, 2021

Indeed. Furthermore this basically just benches function call overhead by using the worst possible implementation of fib().

Function call is a well-known weak point of cpython, even amongst all its other weak points performance-wise.

It's hard to express how utterly uninteresting and useless TFA is, and if its author is surprised by the result… really the only component this tells us about is the author.

> Would be much more interesting to compare to pypy.

I'm not sure it is more interesting at all, let alone much more, but here are the results on my (obviously much slower than TFA's) machine:

    > python3.9 --version
    Python 3.9.1
    > python3.9 fib.py
    8555.904865264893 ms
    > pypy37 --version
    Python 3.7.9 (7e6e2bb30ac5fbdbd443619cae28c51d5c162a02, Jan 15 2021, 06:03:20)
    [PyPy 7.3.3-beta0 with GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)]
    > pypy37 fib.py
    715.0719165802002 ms
    > node --version
    v14.15.4
    > node fib.js
    247.19056797027588 ms

(can I note that the 12 decimal precision of the script is hilarious? Because clearly when you're benching fib(35) you need that femtosecond-scale precision)

zamadatix · on Feb 9, 2021

Seems like an unduly harsh take. Is everyone just supposed to inherently know this fact about cpython or find it unsurprising when they run across it?

"Obviously this isn't the most comprehensive benchmark, but the results are surprising to me."

I fully agree with this and I learned something new about cpython's weakpoints today!

wrs · on Feb 9, 2021

It does seem a bit harsh, but I would put it like this: If you need to care about performance, then this is an extremely basic difference between the two language implementations that should not surprise you. If you don’t yet, then this is the very beginning of your education in how to care about performance—welcome to the next level!

dooferlad · on Feb 9, 2021

Oh, I think it is a quite reasonable comment. It isn't interesting work being done in the benchmark and the benchmark itself is short.

If you want to get ultimate performance from Python then write a C function...

If you want to inform me about runtime performance then show me how the language runtimes are spending cycles. If you wish to convince me about a language being great then tell me about the engineering effort to create and then run something in production.

zamadatix · on Feb 9, 2021

It's a 2 sentence post, the latter of which seems to be trying to dispel the idea that it's about what you're implying it is. Uninteresting to experts in the area that consider this common knowledge maybe, not a failure of the author to do something worthwhile though.

prepend · on Feb 9, 2021

If I’m writing an article about benchmarking, then yes.

creata · on Feb 9, 2021

In the author's defence, the "article" isn't even 50 words long.

nullcat · on Feb 9, 2021

These details aren't important. It's like bench marking C++ and node and then complaining about implementation details. Node should be and is definitively faster then python in practically every bench mark.

I still prefer python over node though, but I can't deny the reality.

locotower · on Feb 12, 2021

I would say it's faster generally but not definitively.

simple_phrases · on Feb 9, 2021

Not only that, it wasn't until 3.9 that Python got optimizations for calling callables.

masklinn · on Feb 9, 2021

Doesn't really matter, I ran it on 3.9, it's still slow. Between the kind of language Python is and the optimisation CPython allows for itself, there is simply no way it could be competitive.

meetups323 · on Feb 9, 2021

  pypy test.py

305.4699897766113 ms

  node test.js

111.49054491519928 ms

  python3 test.py

3576.0366916656494 ms

Node still wins by a healthy margin.

meetups323 · on Feb 9, 2021

Numba (asked by a commenter that haas since deleted:

638.8082504272461 ms

PartiallyTyped · on Feb 9, 2021

I asked, run it twice. The first time is for compilation. The second time it reaches 90ms on my machine.

meetups323 · on Feb 9, 2021

Second run was 200ms for me, still 2x node.

whimsicalism · on Feb 9, 2021

I would expect numba to win, but I also do not think it is a fair comparison.

PartiallyTyped · on Feb 9, 2021

To be fair, python itself isn't jitting anything, while node is.

meetups323 · on Feb 9, 2021

Numba doesn't really qualify as JIT if what it's really doing is compiling it and caching the results to disk then reading from them in the future for faster execution... that's just a compiler.

Aardwolf · on Feb 9, 2021

> An interpreter with a JIT is obviously faster than one without.

And why doesn't python's default runtime environment obviously come with a JIT then? I think they should absolutely go for it, given the huge user base of python.

Reasons like "but named functions can dynamically change" are not applicable, since JS has those same properties and can do it. They could start with optimizing the case where you call the same function over and over in a for loop.

An alternative python interpreter is also not the solution, normally what you have is the main standard python interpreter, and that is the one that should be fast, period.

masklinn · on Feb 9, 2021

> And why doesn't python's default runtime environment come with JIT? I think they should absolutely go for it, ensure the default python you get when you run python has a JIT, given the huge user base of python.

1. because CPython aims to be relatively simple and straightforward by choice

2. because the "huge user base" comes in large parts from the deep and extensive C API, which is absolute hell on a JIT

3. because most of the userbase would not give a shit anyway, it has not exactly migratd en masse to pypy: much of the userbase sees and uses Python as a glue language, Python is an interface to optimised C routines without being a pain in the ass to develop in.

ak217 · on Feb 9, 2021

> because CPython aims to be relatively simple and straightforward by choice

Sacrificing performance for core interpreter developer convenience may have been the right choice when Python was getting started; it's no longer the right choice today. Today it's short-sighted.

> because the "huge user base" comes in large parts from the deep and extensive C API, which is absolute hell on a JIT

We can have both a JIT and a "deep and extensive" (or more importantly, stable) native API, as demonstrated by node.

> because most of the userbase would not give a shit anyway

Actually a lot of the userbase don't use libraries with native extensions, are painfully aware of Python's performance issues, and are intensely interested in addressing them.

thu2111 · on Feb 10, 2021

We can have both a JIT and a "deep and extensive" (or more importantly, stable) native API, as demonstrated by node.

Yes, but not that native API. Designing a native API that doesn't create huge problems later is difficult. The JVM, .NET and V8 guys managed it (mostly) but the scripting languages generally didn't. Their API is just literally the entire internals of the interpreter.

Figuring out how to JIT code in the presence of native extensions that expect the implementation to work in exactly the same way it always worked is a research problem. The only people who have got close to solving it are the GraalVM guys. They do it by virtualising the interpreter API and also JIT-compiling the C code! They run LLVM bitcode on the same engine that runs the scripting engine.

ak217 · on Feb 10, 2021

Thanks for the context, those are great points.

Aardwolf · on Feb 9, 2021

> 1. because CPython aims to be relatively simple and straightforward by choice

It's a painful choice, and having to use numpy for everything that a loop normally could do but is too slow for, or discourage making functions because a function call is so slow, makes an otherwise elegant language less so

nemothekid · on Feb 9, 2021

Armin Ronacher, author Flask, actually has a good talk about this. The gist of it the way Python's internals leak into the language makes it very difficult to build a performant JIT that wouldn't break a large amount of userspace code.

Python lets you do _far_ more shenanigans that Javascript does; and a lot of large libraries depend on some of that behavior. Breaking it would probably cause a new 2 -> 3 situation.

https://www.youtube.com/watch?v=qCGofLIzX6g&feature=emb_titl...

ak217 · on Feb 9, 2021

> a lot of large libraries depend on some of that behavior.

Armin makes great points about path dependence of API design and how the CPython API leaks into the Python language spec. But the features being discussed are actually obscure (example: slots) or intended for debugging (example: frame introspection), and most libraries don't have a good reason to use them. We're stuck in a loop: people talk about how Python is special and can't use a JIT because its internals are not JIT-friendly, so we don't have a JIT, so implementers continue to make choices that are not JIT-friendly - not because they want to, but because they have no guidance.

The JIT doesn't have to be amazing on day 1. What it does have to do is show a commitment and a path to performant code, and illuminate situations where optimizations turn off. There's nothing fundamental in Python's design that prevents a JIT from working; a small number of rarely used dynamic features (that most people don't know about and don't know that they can negatively affect performance) should not be used to hold up interpreter design.

thu2111 · on Feb 10, 2021

GraalPython is on its way to solving this, by co-JIT-compiling both Python and the code of the native extensions simultaneously. However Python is a large language and ecosystem, so it'll take a while for the implementation to mature.

Sohcahtoa82 · on Feb 9, 2021

> Python lets you do _far_ more shenanigans that Javascript does

Does JavaScript let you do this monstrosity of terrible code?

    Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> class Dog(object):
    ...     def speak(self):
    ...             print("bark!")
    ... 
    >>> class Cat(object):
    ...     def speak(self):
    ...             print("meow!")
    ...
    >>> animal = Dog()
    >>> animal.speak()
    bark!
    >>> animal.__class__ = Cat
    >>> animal.speak()
    meow!
    >>>

Zarel · on Feb 10, 2021

A lot of people have responded "yes", but no one's taught you how yet, so here's some example code that you can paste into your browser's Devtools or Node's REPL:

    class Dog {
      speak() {
        console.log("bark!");
      }
    }
    class Cat {
      speak() {
        console.log("meow!");
      }
    }
    animal = new Dog();
    animal.speak();
    Object.setPrototypeOf(animal, Cat.prototype);
    animal.speak();

This will produce:

    bark!
    meow!

tracker1 · on Feb 9, 2021

Yes, you absolutely can. In fact, this was part of my ES5 Date constructor polyfill.

detaro · on Feb 9, 2021

You can replace an object's prototype in JS, wouldn't that do the same?

danfo · on Feb 10, 2021

Let me help you detox that:

> An interpreter with a JIT is faster than one without. Especially when dealing with CPU bound work. Would be interesting to compare to pypy.

Sure, here is pypy:

  > pypy3 main.py
  282 ms

  > node main.js
  105 ms

Without JIT:

  > python3 main.py
  2818 ms

  > node --jitless main.js
  998 ms

For fun:

  > cargo run --release -q
  20 ms

I enjoy brrrrrm's post for its brevity and the acknowledgement of common folk tools.

trog · on Feb 10, 2021

Tried it in PHP v8 (on Windows):

.\php fib.php 1402.1289348602 ms

and with PHP8's new JIT opcache on:

.\php -dopcache.enable_cli=1 -dopcache.jit_buffer_size=200M fib.php 219.55609321594 ms

ak217 · on Feb 9, 2021

It's noteworthy to the extent that the CPython development team continues to refuse to implement a JIT runtime, to the detriment of the Python community. PyPy is not the default Python interpreter, and most Python libraries don't target compatibility with it.

uses · on Feb 9, 2021

How does the JIT compilation help in terms of CPU bound work? Is node somehow able to automatically parallelize this code? I know cpython is limited to a single core unless you specifically use multiprocessing. Or is this related to something else?

masklinn · on Feb 9, 2021

> Or is this related to something else?

Overhead. Each operation translates to Python bytecode, and the Python interpreter performs a full loop of the core for each bytecode instruction.

The humble

    a + b

is

    LOAD_FAST a
    LOAD_FAST b
    BINARY_ADD

each of which gets painstakenly executed by the corresponding completely static handler which yields something along the lines of:

    fetch the bytecode
    jump to the handler
    access the function locals
    push the value for `a` (which TBF is just an offset into an array) onto the stack
    increment the bytecode index
    fetch the bytecode
    jump to the handler
    access the function locals
    push the value for `b` onto the stack
    increment the bytecode index
    fetch the bytecode
    jump to the handler
    popp both values off the stack
    dereference the type of `a`
    look for the pointer to the add method
    check if it's set
    call it with `a` and `b`
        which performs various runtime typechecks (e.g. are both parameters objects and integers) and does the actual addition
    push the result back onto the stack
    increment the bytecode index

Assuming a hot loop, a JIT might literally just emit an assembly-level

    add r10, r11

or whatever register it allocated to those locals.

An other component is that these are likely comparing apples and pears: CPython uses infinite-precision integer arithmetics. And due to not using a JIT it has no way to even remotely optimise any of that away. Infinite precision arithmetics are pretty expensive as they require lots of overflow checking.

sitharus · on Feb 9, 2021

Python 3 uses interpreted bytecode, it’s faster than running an interpreter over an AST but much slower than using raw machine code. For most things people use Python for this is fine, especially since there are many extensions to do cpu-intensive tasks written in C.

baybal2 · on Feb 9, 2021

Crossing the interpreter-native-code domain is expensive, primarily because of poor cache usage.

You want to be either full JIT compiled code, or vice versa have as much interpreter functions, and libraries written in native code, and style the API to avoid loops.

ArchieMaclean · on Feb 9, 2021

JITs work based on assumptions that types/values will stay constant. So, here it probably assumes that it will always be working with integers. So it will be much more efficient in cases like this as it is pretty much pure, simple computation. At worst there could be one deoptimisation when it moves from 32-bit to 64-bit integers, if v8 uses 32-bit integers first.

So it can emit extremely efficient instructions based on this assumption, while CPython struggles along with infinite precision numbers.

Function calls will have a much lower overhead also since it will just be a single `call` instruction.

nullcat · on Feb 9, 2021

Agreed. The speed difference between Node and Python is well known. v8 is one of the fastest things around. However, it's more than just JIT. There are business reasons behind why Node is so fast. The number of resources google has thrown against developing v8 means that pretty much nothing can surpass it in speed any time soon.

The fact this is on someones blog and posted to the front page means that a lot of people didn't know this. Well guess what, for you guys who don't know.... here's another fun fact: C++ is about 10x faster then node which makes it about 200x faster then python.