Indeed. Furthermore this basically just benches function call overhead by using the worst possible implementation of fib().
Function call is a well-known weak point of cpython, even amongst all its other weak points performance-wise.
It's hard to express how utterly uninteresting and useless TFA is, and if its author is surprised by the result… really the only component this tells us about is the author.
> Would be much more interesting to compare to pypy.
I'm not sure it is more interesting at all, let alone much more, but here are the results on my (obviously much slower than TFA's) machine:
> python3.9 --version
Python 3.9.1
> python3.9 fib.py
8555.904865264893 ms
> pypy37 --version
Python 3.7.9 (7e6e2bb30ac5fbdbd443619cae28c51d5c162a02, Jan 15 2021, 06:03:20)
[PyPy 7.3.3-beta0 with GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)]
> pypy37 fib.py
715.0719165802002 ms
> node --version
v14.15.4
> node fib.js
247.19056797027588 ms
(can I note that the 12 decimal precision of the script is hilarious? Because clearly when you're benching fib(35) you need that femtosecond-scale precision)
It does seem a bit harsh, but I would put it like this: If you need to care about performance, then this is an extremely basic difference between the two language implementations that should not surprise you. If you don’t yet, then this is the very beginning of your education in how to care about performance—welcome to the next level!
Oh, I think it is a quite reasonable comment. It isn't interesting work being done in the benchmark and the benchmark itself is short.
If you want to get ultimate performance from Python then write a C function...
If you want to inform me about runtime performance then show me how the language runtimes are spending cycles. If you wish to convince me about a language being great then tell me about the engineering effort to create and then run something in production.
It's a 2 sentence post, the latter of which seems to be trying to dispel the idea that it's about what you're implying it is. Uninteresting to experts in the area that consider this common knowledge maybe, not a failure of the author to do something worthwhile though.
These details aren't important. It's like bench marking C++ and node and then complaining about implementation details. Node should be and is definitively faster then python in practically every bench mark.
I still prefer python over node though, but I can't deny the reality.
Doesn't really matter, I ran it on 3.9, it's still slow. Between the kind of language Python is and the optimisation CPython allows for itself, there is simply no way it could be competitive.
Numba doesn't really qualify as JIT if what it's really doing is compiling it and caching the results to disk then reading from them in the future for faster execution... that's just a compiler.
> An interpreter with a JIT is obviously faster than one without.
And why doesn't python's default runtime environment obviously come with a JIT then? I think they should absolutely go for it, given the huge user base of python.
Reasons like "but named functions can dynamically change" are not applicable, since JS has those same properties and can do it. They could start with optimizing the case where you call the same function over and over in a for loop.
An alternative python interpreter is also not the solution, normally what you have is the main standard python interpreter, and that is the one that should be fast, period.
> And why doesn't python's default runtime environment come with JIT? I think they should absolutely go for it, ensure the default python you get when you run python has a JIT, given the huge user base of python.
1. because CPython aims to be relatively simple and straightforward by choice
2. because the "huge user base" comes in large parts from the deep and extensive C API, which is absolute hell on a JIT
3. because most of the userbase would not give a shit anyway, it has not exactly migratd en masse to pypy: much of the userbase sees and uses Python as a glue language, Python is an interface to optimised C routines without being a pain in the ass to develop in.
> because CPython aims to be relatively simple and straightforward by choice
Sacrificing performance for core interpreter developer convenience may have been the right choice when Python was getting started; it's no longer the right choice today. Today it's short-sighted.
> because the "huge user base" comes in large parts from the deep and extensive C API, which is absolute hell on a JIT
We can have both a JIT and a "deep and extensive" (or more importantly, stable) native API, as demonstrated by node.
> because most of the userbase would not give a shit anyway
Actually a lot of the userbase don't use libraries with native extensions, are painfully aware of Python's performance issues, and are intensely interested in addressing them.
We can have both a JIT and a "deep and extensive" (or more importantly, stable) native API, as demonstrated by node.
Yes, but not that native API. Designing a native API that doesn't create huge problems later is difficult. The JVM, .NET and V8 guys managed it (mostly) but the scripting languages generally didn't. Their API is just literally the entire internals of the interpreter.
Figuring out how to JIT code in the presence of native extensions that expect the implementation to work in exactly the same way it always worked is a research problem. The only people who have got close to solving it are the GraalVM guys. They do it by virtualising the interpreter API and also JIT-compiling the C code! They run LLVM bitcode on the same engine that runs the scripting engine.
> 1. because CPython aims to be relatively simple and straightforward by choice
It's a painful choice, and having to use numpy for everything that a loop normally could do but is too slow for, or discourage making functions because a function call is so slow, makes an otherwise elegant language less so
Armin Ronacher, author Flask, actually has a good talk about this. The gist of it the way Python's internals leak into the language makes it very difficult to build a performant JIT that wouldn't break a large amount of userspace code.
Python lets you do _far_ more shenanigans that Javascript does; and a lot of large libraries depend on some of that behavior. Breaking it would probably cause a new 2 -> 3 situation.
> a lot of large libraries depend on some of that behavior.
Armin makes great points about path dependence of API design and how the CPython API leaks into the Python language spec. But the features being discussed are actually obscure (example: slots) or intended for debugging (example: frame introspection), and most libraries don't have a good reason to use them. We're stuck in a loop: people talk about how Python is special and can't use a JIT because its internals are not JIT-friendly, so we don't have a JIT, so implementers continue to make choices that are not JIT-friendly - not because they want to, but because they have no guidance.
The JIT doesn't have to be amazing on day 1. What it does have to do is show a commitment and a path to performant code, and illuminate situations where optimizations turn off. There's nothing fundamental in Python's design that prevents a JIT from working; a small number of rarely used dynamic features (that most people don't know about and don't know that they can negatively affect performance) should not be used to hold up interpreter design.
GraalPython is on its way to solving this, by co-JIT-compiling both Python and the code of the native extensions simultaneously. However Python is a large language and ecosystem, so it'll take a while for the implementation to mature.
A lot of people have responded "yes", but no one's taught you how yet, so here's some example code that you can paste into your browser's Devtools or Node's REPL:
class Dog {
speak() {
console.log("bark!");
}
}
class Cat {
speak() {
console.log("meow!");
}
}
animal = new Dog();
animal.speak();
Object.setPrototypeOf(animal, Cat.prototype);
animal.speak();
It's noteworthy to the extent that the CPython development team continues to refuse to implement a JIT runtime, to the detriment of the Python community. PyPy is not the default Python interpreter, and most Python libraries don't target compatibility with it.
How does the JIT compilation help in terms of CPU bound work? Is node somehow able to automatically parallelize this code? I know cpython is limited to a single core unless you specifically use multiprocessing. Or is this related to something else?
Overhead. Each operation translates to Python bytecode, and the Python interpreter performs a full loop of the core for each bytecode instruction.
The humble
a + b
is
LOAD_FAST a
LOAD_FAST b
BINARY_ADD
each of which gets painstakenly executed by the corresponding completely static handler which yields something along the lines of:
fetch the bytecode
jump to the handler
access the function locals
push the value for `a` (which TBF is just an offset into an array) onto the stack
increment the bytecode index
fetch the bytecode
jump to the handler
access the function locals
push the value for `b` onto the stack
increment the bytecode index
fetch the bytecode
jump to the handler
popp both values off the stack
dereference the type of `a`
look for the pointer to the add method
check if it's set
call it with `a` and `b`
which performs various runtime typechecks (e.g. are both parameters objects and integers) and does the actual addition
push the result back onto the stack
increment the bytecode index
Assuming a hot loop, a JIT might literally just emit an assembly-level
add r10, r11
or whatever register it allocated to those locals.
An other component is that these are likely comparing apples and pears: CPython uses infinite-precision integer arithmetics. And due to not using a JIT it has no way to even remotely optimise any of that away. Infinite precision arithmetics are pretty expensive as they require lots of overflow checking.
Python 3 uses interpreted bytecode, it’s faster than running an interpreter over an AST but much slower than using raw machine code. For most things people use Python for this is fine, especially since there are many extensions to do cpu-intensive tasks written in C.
Crossing the interpreter-native-code domain is expensive, primarily because of poor cache usage.
You want to be either full JIT compiled code, or vice versa have as much interpreter functions, and libraries written in native code, and style the API to avoid loops.
JITs work based on assumptions that types/values will stay constant. So, here it probably assumes that it will always be working with integers. So it will be much more efficient in cases like this as it is pretty much pure, simple computation. At worst there could be one deoptimisation when it moves from 32-bit to 64-bit integers, if v8 uses 32-bit integers first.
So it can emit extremely efficient instructions based on this assumption, while CPython struggles along with infinite precision numbers.
Function calls will have a much lower overhead also since it will just be a single `call` instruction.
Agreed. The speed difference between Node and Python is well known. v8 is one of the fastest things around. However, it's more than just JIT. There are business reasons behind why Node is so fast. The number of resources google has thrown against developing v8 means that pretty much nothing can surpass it in speed any time soon.
The fact this is on someones blog and posted to the front page means that a lot of people didn't know this. Well guess what, for you guys who don't know.... here's another fun fact: C++ is about 10x faster then node which makes it about 200x faster then python.
I'm not sure this is entirely noteworthy unless you somehow think CPython has a JIT. Would be much more interesting to compare to pypy.