Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>... is not necessarily a performance improvement over a sync version that just starts more workers and has the OS kernel scheduler organise things.

This is very true, especially when actual work is involved.

Remember, the kernel uses the exact same mechanism to have a process wait on a synchronous read/write, as it does for a processes issuing epoll_wait. Furthermore, isolating tasks into their own processes (or, sigh, threads), allows the kernel scheduler to make much better decisions, such as scheduling fairness and QoS to keep the system responsive under load surges.

Now, async might be more efficient if you serve extreme numbers of concurrent requests from a single thread if your request processing is so simple that the scheduling cost becomes a significant portion of the processing time.

... but if your request processing happens in Python, that's not the case. Your own scheduler implementation (your event loop) will likely also end up eating some resources (remember, you're not bypassing anything, just duplicating functionality), and is very unlikely to be as smart or as fair as that of the kernel. It's probably also entirely unable to do parallel processing.

And this is all before we get into the details of how you easily end up fighting against the scheduler...



Yeah except nodejs will beat flask in this same exact benchmark. Explain that.


CPython doesn't have a JIT, while node.js does. If you want to compare apples to apples, try looking at Flask running on PyPy.


Ed: after reading the article, I guess it's safe to say that everything below is false :)

---

I'd guess the c++ event loop is more important than the jit?

Maybe a better comparison is quart (with eg uvicorn)

https://pgjones.gitlab.io/quart/

https://www.uvicorn.org/

Or Sanic / uvloop?

https://sanicframework.org/

https://github.com/MagicStack/uvloop


Plain sanic runs much faster than the uvicorn-ASGI-sanic stack used in the benchmark, and the ASGI API in the middle is probably degrading other async frameworks' performance too. But then this benchmark also has other major issues, like using HTTP/1.0 without keep-alive in its Nginx proxy_pass config (keep-alive again has a huge effect on performance, and would be enabled on real performance-critical servers). https://sanic.readthedocs.io/en/latest/sanic/nginx.html


Interesting, thank you. I wasn't aware nginx was so conservative by default.

https://nginx.org/en/docs/http/ngx_http_proxy_module.html#pr...


You're not completely off. There might be issues with async/await overhead that would be solved by a JIT, but also if you're using asyncio, the first _sensible_ choice to make would be to swap out the default event loop with one actually explicitly designed to be performant, such as uvloop's one, because asyncio.SelectorEventLoop is designed to be straightforward, not fast.

There's also the major issue of backpressure handling, but that's a whole other story, and not unique to Python.

My major issue with the post I replied to is that there are a bunch of confounding issues that make the comparison given meaningless.


The database is the bottleneck. JIT or even C++ shouldn't even be a factor here. Something is wrong with the python implimentation of async await.


If I/O-bound tasks are the problem, that would tend to indicate an issue with I/O event loop, not with Python and its async/await implementation. If the default asyncio.SelectorEventLoop is too slow for you, you can subclass asyncio.AbstractEventLoop and implement your own, such as buildiong one on top of uvloop. And somebody's already done that: https://github.com/MagicStack/uvloop

Moreover, even if there's _still_ a discrepancy, unless you're profiling things, the discussion is moot. This isn't to say that there aren't problems (there almost certainly are), but that you should get as close as possible to an apples-to-apples comparison first.


When I talk about async await I'm talking about everything that encompasses supporting that syntax. This includes the I/O event loop.

So really we're in agreement. You're talking about reimplementing python specific things to make it more performant, and that is exactly another way of saying that the problem is python specific.


No, we're not in agreement. You're confounding a bunch of independent things, and that is what I object to.

It's neither fair nor correct to mush together CPython's async/await implementation with the implementation of asyncio.SelectorEventLoop. They are two different things and entirely independent of one another.

Moreover, it's neither fair nor correct to compare asyncio.SelectorEventLoop with the event loop of node.js, because the former is written in pure Python (with performance only tangentally in mind) whereas the latter is written in C (libuv). That's why I pointed you to uvloop, which is an implementation of asyncio.AbstractEventLoop built on top of libuv. If you want to even start with a comparison, you need to eliminate that confounding variable.

Finally, the implementation matters. node.js uses a JIT, while CPython does not, giving them _much_ different performance characteristics. If you want to eliminate that confounding variable, you need to use a Python implementation with a JIT, such as PyPy.

Do those two things, and then you'll be able to do a fair comparison between Python and node.js.


Except the problem here is that those tests were bottlenecked by IO. Whether you're testing C++, pypy, libuv, or whatever it doesn't matter.

All that matters is the concurrency model because that application he's running is barely doing anything else except IO and anything outside of IO becomes negligible because after enough requests, those sync worker processes will all be spending the majority of their time blocked by an IO request.

The basic essence of the original claim is that sync is not necessarily better than async for all cases of high IO tasks. I bring up node as a counter example because that async model IS Faster for THIS same case. And bringing up node is 100% relevant because IO is the bottleneck, so it doesn't really matter how much faster node is executing as IO should be taking most of the time.

Clearly and logically the async concurrency model is better for these types of tasks so IF tests indicate otherwise for PYTHON then there's something up with python specifically.

You're right, we are in disagreement. I didn't realize you completely failed to understand what's going on and felt the need to do an apples to apples comparison when such a comparison is not Needed at all.


No, I understand. I just think that your comparison with _node.js_ when there are a bunch of confounding variables is nonsense. Get rid of those and then we can look at why "nodejs will beat flask in this same exact benchmark".


> I just think that your comparison with _node.js_ when there are a bunch of confounding variables is nonsense

And I'm saying all those confounding variables you're talking about are negligible and irrelevant.

Why? Because the benchmark test in the article is a test where every single task is 99% bound by IO.

What each task does is make a database call AND NOTHING ELSE. Therefore you can safely say that for either python or Node request less than 1% of a single task will be spent on processing while 99% of the task is spent on IO.

You're talking about scales on the order of 0.01% vs. 0.0001%. Sure maybe node is 100x faster, but it's STILL NEGLIGIBLE compared to IO.

It it _NOT_ Nonsense.

You Do not need an apples to apples comparison to come to the conclusion that the problem is Specific to the python implementation. There ARE NO confounding variables.


> And I'm saying all those confounding variables you're talking about are negligible and irrelevant.

No, you're asserting something without actual evidence, and the article itself doesn't actually state that either: it contains no breakdown of where the time is spent. You're assuming the issue lies in one place (Python's async/await implementation) when there are a bunch of possible contributing factors _which have not been ruled out_.

Unless you've actually profiled the thing and shown where the time is used, all your assertions are nonsense.

Show me actual numbers. Prove there are no confounding variables. You made an assertion that demands evidence and provided none.


>Unless you've actually profiled the thing and shown where the time is used, all your assertions are nonsense.

It's data science that is causing this data driven attitude to invade peoples minds. Do you not realize that logic and assumptions take a big role in drawing conclusions WITHOUT data? In fact if you're a developer you know about a way to DERIVE performance WITHOUT a single data point or benchmark or profile. You know about this method, you just haven't been able to see the connections and your model about how this world works (data driven conclusions only) is highly flawed.

I can look at two algorithms and I can derive with logic alone which one is O(N) and which one is O(N^2). There is ZERO need to run a benchmark. The entire theory of complexity is a mathematical theory used to assist us at arriving AT PERFORMANCE conclusions WITHOUT EVIDENCE/BENCHMARKS.

Another thing you have to realize is the importance of assumptions. Things like 1 + 1 = 2 will remain true always and that a profile or benchmark ran on a specific task is an accurate observation of THAT task. These are both reasonable assumptions to make about the universe. They are also the same assumptions YOU are making everytime you ask for EVIDENCE and benchmarks.

What you aren't seeing is this: The assumptions I AM making ARE EXACTLY THE SAME: reasonable.

>you're asserting something without actual evidence, and the article itself doesn't actually state that either: it contains no breakdown of where the time is spent

Let's take it from the top shall we.

I am making the assumption that tasks done in parallel ARE Faster than tasks done sequentially.

The author specifically stated he made a server that where each request fetches a row from the database. And he is saying that his benchmark consisted of thousands of concurrent requests.

I am also making the assumption that for thousands of requests and thousands of database requests MOST of the time is spent on IO. It's similar to deriving O(N) from a for loop. I observe the type of test the author is running and I make a logical conclusion on WHAT SHOULD be happening. Now you may ask why is IO specifically taking up most of the time of a single request a reasonable assumption? Because all of web development is predicated on this assumption. It's the entire reason why we use inefficient languages like python, node or Java to run our web apps instead of C++, because the database is the bottleneck. It doesn't matter if you use python or ruby or C++, the server will always be waiting on the db. It's also a reasonable assumption given my experience working with python and node and databases. Databases are the bottleneck.

Given this highly reasonable assumption, and in the same vein as using complexity theory to derive performance speed, it is highly reasonable for me to say that the problem IS PYTHON SPECIFIC. No evidence NEEDED. 1 + 1 = 2. I don't need to put that into my calculator 100 times to get 100 data points for some type of data driven conclusion. It's assumed and it's a highly reasonable assumption. So reasonable that only an idiot would try to verify 1 + 1 = 2 using statistics and experiments.

Look you want data and no assumptions? First you need to get rid of the assumption that a profiler and benchmark is accurate and truthful. Profile the profiler itself. But then your making another assumption: The profiler that profiled the profiler is accurate. So you need to get me data on that as well. You see where this is going?

There is ZERO way to make any conclusion about anything without making an assumption. And Even with an assumption, the scientific method HAS NO way of proving anything to be true. Science functions on the assumption that probability theory is an accurate description of events that happen in the real world AND even under this assumption there is no way to sample all possible EVENTS for a given experiment so we can only verify causality and correlations to a certain degree.

The truth is blurry and humans navigate through the world using assumptions, logic and data. To intelligently navigate the world you need to know when to make assumptions and when to use logic and when data driven tests are most appropriate. Don't be an idiot and think that everything on the face of the earth needs to be verified with statistics, data and A/B tests. That type of thinking is pure garbage and it is the same misguided logic that is driving your argument with me.


Buddy, you can make all the "logical arguments" you want, but if you can't back up them up with evidence, you're just making guesses.


Nodejs is faster than Python as a general rule, anyway. As I understand, Nodejs compiles Javascript, Python interprets Python code.

I do a lot of Django and Nodejs and Django is great to sketch an app out, but I've noticed rewriting endpoints in Nodejs directly accessing postgres gets much better performance.

Just my 2c


CPython, the reference implementation, interprets Python. PyPy interprets and JIT compiles Python, and more exotic things like Cython and Grumpy statically compiles Python (often through another, intermediate language like C or Go).

Node.js, using V8, interprets and JIT compiles JavaScript.

Although note that, while Node.js is fast relative to Python, it's still pretty slow. If you're writing web-stuff, I'd recommend Go instead for casually written, good performance.


The compare between Django against no-ORM is a bit weird given that rewriting your endpoint in python without Django or ORM would also have produced better results I suppose.


Right but this test focused on concurrent IO. The bottleneck is not the interpreter but the concurrency model. It doesn't matter if you coded it in C++, the JIT shouldn't even be a factor here because the bottleneck is IO and therefore ONLY the concurrency model should be a factor here. You should only see differences in speed based off of which model is used. All else is negligible.

So you have two implementations of async that are both bottlenecked by IO. One is implemented in node. The other in python.

The node implementation behaves as expected in accordance to theory meaning that for thousands of IO bound tasks it performs faster then a fixed number of sync worker threads (say 5 threads).

This makes sense right? Given thousands of IO bound tasks, eventually all 5 threads must be doing IO and therefore blocked on every task, while the single threaded async model is always context switching whenever it encounters an IO task so it is never blocked and it is always doing something...

Meanwhile the python async implementation doesn't perform in accordance to theory. 5 async workers is slower then 5 sync workers on IO bound tasks. 5 sync workers should eventually be entirely blocked by IO and the 5 async workers should never be blocked ever... Why is the python implementation slower? The answer is obvious:

It's python specific. It's python that is the problem.


JIT compiler.


Bottleneck is IO. Concurrency model should be the limiting factor here.

NodeJS is faster than flask because of the concurrency model and NOT because of the JIT.

The python async implementation being slower than the python sync implementation means one thing: Something is up with python.

The poster implies that with the concurrency model the outcome of these tests are expected.

The reality is, these results are NOT expected. Something is going on specifically with the python implementation.


You mean express.js ?


NodeJS primitives are enough to produce the same functionality as flask without the need for an extra framework.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: