Hacker News new | past | comments | ask | show | jobs | submit login
12 requests per second: A realistic look at Python web frameworks (suade.org)
503 points by gilad on Feb 19, 2021 | hide | past | favorite | 237 comments



My experience doing perf optimizations in real world systems with many many people writing code to the same app is a lot of inefficiencies happen due to over fetching data, inefficiencies caused by naively using the ORM without understanding the underlying cost of the query, and lack of actual profiling to find where the actual bottlenecks are (usually people writing dumb code without realizing it's expensive).

Sure, the framework matters at very large scale and the benefits from optimizing the framework become large when you're doing millions of requests a second over many thousands of servers because it can help reduce baseline cost of running the service.

But I agree with the author's main point which seems to be that framework performance is pretty meaningless when comparing frameworks if you're just starting on a new project. Focus on making a product people wanna actually use first. If you're lucky enough to get to scale you can work about optimizing it then.


There is lots of truth to this. Some ORMs like Django perform joins in very unsuspecting ways.

A simple example is, say, foreign keys. Trying to access the foreign key of an object by doing `book.user.id` does an additional query for the user table to get the ID. It's less known that the id is immediately available by just doing `book.user_id` instead.

I've spent time optimising things like text searches down from 2000+ queries to about ~4, and one of the more noticeable things to me isn't actually the number of joins, rather the SELECT's that take place. Many of these ORMs do a SELECT * unless you explicitly tell them to otherwise, and when dealing with large-ish datasets or on models that have large text fields this translates into significant time taken to serialise these attributes. So you can optimise the query and still have it take a long time until you realise that limiting the initial `SELECT` parameter is probably more efficient than limiting the number of joins.


The most insidious part about misusing ORMs is it's often not visible for a while. Modern DBMSs on modern hardware are crazy fast, so when you have only a few tens or hundreds of thousand rows in your table, those inefficient and pointless ORM queries are just not noticeable because you still get sub-second response times. As your database grows, the site begins to gets slower and slower, but it's hard to distinguish between the real problem and "I guess we're just handling more requests per second".

I personally love tools like Miniprofiler [1] for this (though maybe there's something better today, it's been a while since I've worked on that type of thing). It's a constant and accessible way to keep an eye on what goes into each request, and I've caught many of those bad queries before they were problems by using it (eg: "WTF, why did it take 9 queries and 250ms to grab what looks to be a single row from a single table?!").

[1] https://miniprofiler.com/


To be fair this is a problem inherent to databases in general. You can have hand written queries that perform badly due to structure or query frequency as well which are not apparent until the dataset grows. The ORM should make it easier to rectify such situations (eg drop in an eager loading directive) vs having to restructure hand-written routines for similar effects.


Indeed, even with query analyzer you might see say table scans instead of index scan just because the DB realizes just scanning the 100 rows you got is faster than trying to use an index.

So without a large number of rows it can be hard to know what it will actually do.


We have everything hooked into lightstep. Makes it extremely easy to track down problematic operations.


I encountered this a few times and started adding tests that assert each handler only executes the expected number of queries (and no more). If the application code is modified such that this N+1 query pattern occurs the test will immediately fail and you go optimise the query, problem solved.

https://docs.djangoproject.com/en/dev/topics/testing/tools/#...


Or the person who changed the code then disables the test or sets N to 100,000,000 or something equally pleasurable to debug.


> A simple example is, say, foreign keys. Trying to access the foreign key of an object by doing `book.user.id` does an additional query for the user table to get the ID. It's less known that the id is immediately available by just doing `book.user_id` instead.

Hibernate (on Java) at least optimizes this specific use-case. At first, accessing a lazy-loaded property-object will give you a "proxy" and you can access the ID without incurring a database load (since it knows that anyway). And when doing a query, the object won't be joined when requesting book.user.id unless it needs to be (like you have some other WHERE clause that requires an actual join on that row).


> ...in very unsuspecting ways

> Trying to access the foreign key of an object by doing `book.user.id` does an additional query for the user table to get the ID. It's less known that the id is immediately available by just doing `book.user_id` instead.

But that's not really unsuspecting. `book.user` is asking for the user table, `book.user_id` is not. Those two things are not identical even though they return the same value.


IIRC spark can optimize it away, and in OO there shouldn't be user_id


> in OO there shouldn't be user_id

I don't think that's right. There is a user_id column in the book table, so why shouldn't there be book.user_id?


depends if you're doing active record (which I am not considering as typical ORM) or data mapper like hibernate where your entity is POJO with id and annotations but typically no foreign keys (just references to other model classes)

hibernate is then "the magic "environment where it just works"

I realize OO is out of fashion now but it's still true and it still works and I've been in a lots of projects where ORM was useful


> A simple example is, say, foreign keys. Trying to access the foreign key of an object by doing `book.user.id` does an additional query for the user table to get the ID. It's less known that the id is immediately available by just doing `book.user_id` instead.

Hmm.. Sounds like a bug. Why is this not the same value for a foreign key?


Well, it is the same value, but the ORM doesn't handle book.user.id any differently than it does book.user.name where it isn't the same value, and thus the only option is to fetch the second table. So it's not a bug, it's really just the ORM being consistent in how it handles queries, thus missing out on a possible optimization in this special case, where a simpler query could have given the same result.


No, that is clearly a bug. The ORM already has the value of book.id, that's how it knows how to fetch the right book. Performing extra queries is just poor implementation.


The programmer also already had the value in book.user_id but still chose to ask the ORM to fetch all of .user so they could get .id from there instead. And they might then afterward call .name on it as well, and there would be no further queries, because the ORM has already been asked to fetch all fields of .user - so it might in fact have been sensible to fetch all of .user if so. The query builder cannot know whether all of it will be needed or not, because Python is not a compiled language, so there's no way to tell in advance when executing the book.user.id query that no further fields of .user will be needed, so it shouldn't actually do what it's been asked to do, to fetch the entire object, but rather only fetch .id which is available in a different way, so the whole query can be skipped. So yes, this is suboptimal usage, but only the programmer can know that, so it falls to them to optimize if they want to.


Perhaps i'm a bit odd, but when I'm going to lean on an ORM to do things I expect it to actually do them. I expect that foo.user_id does not exist, because that representation has been transformed into an object. foo.user.id should be the only viable reference to the id. foo.user.id should return the value it already knows, any other property access i would expect will do the equiv of `select * from ...` if the object has not previously been populated.

Now perhaps some ORM's prefer to be thinner, to provide more footguns via a leaky abstraction that mixes implementation details with the object mapping. I don't think those are good implementaations.


> Now perhaps some ORM's prefer to be thinner, to provide more footguns via a leaky abstraction that mixes implementation details with the object mapping. I don't think those are good implementaations.

To me, it seems like the (hypothetical?) implementation you're talking about is much more leaky and footgun-y than the more straightforward ("thinner", in your words) version. In order for foo.user.id to not execute a new query, foo.user would have to return some sort of proxy object that only fetched the user row when you tried to access a field that hasn't been loaded. That's way more magic than the more obvious solution—which is to load the row when you access the related object—and could easily cause more problems than it solves in the long run when you need to debug very specific queries.

Furthermore, how is going out of your way to hide a field that exists in the database (user_id) not the definition of a leaky abstraction? What purpose does it serve to direct you through an unnecessary layer if all you need is the ID?


> I expect that foo.user_id does not exist, because that representation has been transformed into an object.

Or you can just consider user_id to be a reference pointer that is part of foo, while user.id is an attribute of user. Totally different things and I am glad that the distinction is there.


I’m sure there are valid engineering reasons to do it this way. One that comes to mind is memory footprint in allocating the objects associate with foreign key references.


Wow! Who would have thought that using an ORM would create so much uncertainty in a codebase.


Fancier ORMs will return a proxy for foo.user that doesn’t issue a query until the user asks for a property on it. And it can return user.id without querying.


But it doesn't have book.user itself. Unless you want Django to construct some empty proxy object representing book.user for this one particular optimization.

If it's a bug, sounds like a "wontfix" to me.


It's not a bug, it is the same value. Only instead of using `first_table.foreign_id` to fetch the entire record from `second_table` only to use `second_table.id`, if you only need the identifier itself you already have it in `first_table`.

A similar concept called covered queries exists whereby you index a table by foreign key, and a few additional columns that you do not expect to use in join conditions, but you do expect to frequently retrieve. Depending on your database, requests for only columns in the index (some being the join condition, some subsequent columns being in the set of popular additional columns) means faster access to those popular columns only. In the context of ORMs, you would need to do something to avoid a default behavior of "select *" in order to exploit this index.


> It's not a bug, it is the same value.

Not necessarily, it can be overridden: "id" is only the default for models that haven't explicitly been given a field with the "primary_key" kwarg (common on legacy tables where the primary key column might for example be "user_id").

The alias guaranteed to be the same value is ".pk", and I'm not sure what django does if you try to create a column named "pk" that isn't the primary key.


If your tables don't use id, or are doing something strange, YOU wrote that code. You should know it's going to be a thing you need to deal with, because YOU did something unusual.


Note that I gave the example of "legacy" tables. On long-lasting codebases there's a high likelihood that "you" didn't write that code and aren't aware of everything Django's doing under the hood.


Because when you are acessing .user you are asking for all its properties, not just the id.

Django does provide relatively easy ways to get over the N+1 issue, though. If you do Book.objects.select_related('user'), only one query is made.


`QuerySet.select_related()` and `QuerySet.prefetch_related()` are the bread and butter of Django query optimisation. I think most of the time that I've noticed a performance issue in our code, it's been easily fixed with one of those.


Django's ORM gets a lot of flak, but I don't remember the last time I had complex queries that I could not do with it.

You still need to understand a minimum of SQL and databases, and usually those that complain about the ORM are the ones that expect it to be a "sufficiently advanced compiler", but it has matured so much that nowadays the developers consider a *bug* every time the answer to How do I do this query X? involves something along the lines of use .extra or raw sql.


This is true, though to be fair to the critics, the syntax through which you express these complex queries is often clunky and unintuitive. For example, I need to re-read the documentation every time I use the annotation API because it's generally not obvious how to use it, and I've run into a few edge cases where you need extra code/syntax just to deal with its nuances and ambiguities.

Even though Django has come a long way, I greatly prefer ORMs like SQLAlchemy and Ecto that map more closely to the SQL query I'm trying to write.


It would return the same value, but the approach to obtain it would be different.

`user` would be a property defined as a User object on the Book model, so accessing `book.user` will cause the framework to fetch the entire user model (even if we then only fetch the id).

On the other hand, `book.user_id` is the auto-generated database column, generated to make the above property definition possible. But since this `user_id` is directly defined on the book object, there is no need to query the user table.


If you're selecting tons of data when you SELECT * you might also have a god object. I prefer to have my model be a bit more split up by use rather than being full of random stuff. E.g. a customer_address table rather than stuffing all that data into customer, even if they only have a single address.


I'm pretty sure Django has an active record ORM, which are generally a bit rubbish in terms of performance. A unit of work ORM such as SQLAlchemy seems to generate much better queries.


I increasingly lean towards plain SQL over ORMs. It requires greater familiarity with SQL but I prefer that over greater familiarity with ORM-specific syntax that doesn’t translate across frameworks or languages. In addition, you can prototype new queries and profile existing queries in the database and copy-paste directly into your code.


I favor a hybrid approach: use the Django ORM to define models, do migrations, auto generate the admin, etc. but don’t be shy about using the extension points (extra, raw, cursors) to put in an optimized query for a hotspot. You can get pretty far using the ORM but it’s really valuable to be able to be comfortable dropping down for things like reports or bulk processing.


I use ORM for Django's auth system alone.I write all other REST API queries in plain SQL.


I agree with you. ORMs are at the end "converting" complexity of SQL capabilities into objects. It is perfectly normal that in best situation they will get at least as complex as SQLs but with features hidden behind annotations or be less efficient.

SQL is readable (at least far more than 20m lasigna of boilerplate objects decorated with tons of annotations googled from internet where no one really know what they do) and you KNOW that if you have optimized the database structure (and filesystem, and network,... :D) and SQL statement you will get peak performance while with ORMs you are on constant hunt what else can you turn on while they are far too huge to read their code.

And they are becoming quite absurd after they "mature" and begin adding corner cases that no one has thought about when they were a starting project.


In the java-world there are libraries like JDBI, which makes it possible to write interfaces with SQL-annotations and have serialization, connection setup etc. done for you:

    public interface UserDao {
        @SqlUpdate("CREATE TABLE user (id INTEGER PRIMARY KEY, name VARCHAR)")
        void createTable();

        @SqlUpdate("INSERT INTO user(id, name) VALUES (?, ?)")
        void insertPositional(int id, String name);

        @SqlQuery("SELECT * FROM user ORDER BY name")
        @RegisterBeanMapper(User.class)
        List<User> listUsers();
    }
This is great because it's explicit. No hidden queries.


Another great option in Java is jOOQ, which lets you write type-safe and potentially composable queries such as:

  context
    .update(User.USER)
    .set(User.USER.NAME, userName)
    .where(User.USER.ID.eq(userId))
    .execute()


jOOQ and its DSL is good, however IMO it's more readable using raw SQL (by using `context.fetchInto` and its variants) than to using DSL when deal with complex query.


I’ve noticed this in other contexts too.

For small queries with straight forward joins, a query builder is nice and readable.

But for larger, more complex queries, I found putting the query into its own file was best for readability.


Why not just create views and query those with jOOQ, then?


hi, @lukaseder, Thank you for creating jOOQ. I don't have much experience on views, and some called `Best Practice` forbid to use that because it's hard to maintain. What do you think about that.


In Python this would look like sqlalchemy's query layer (which is great).


Yeah in my last job, I used my manager power to overrule on using an ORM. In every company I have ever been in, some major outage has occurred due to a seemingly innocuous change causing a DB operation to go from O(N) to O(N^2)+ and its really hard to track down because things seem to work fine in a dev environment, but then can't scale.

And to me, SQL is the easiest language to learn and read of all, and while I understand doing basic marshalling of SQL records into objects is tedious, its really not hard at all, and it just saves so so much heartache down the road.

The exception might be very basic CRUD apps that are meant to be used by third parties and want to support multiple backends like mysql/postgres/whatever. There might be other exceptions as well where you are just trying to prototype/find market fit, but for a typical project that you know is going to be used in a real way, the risks just don't outweigh the benefits IMHO.

Some of the team griped a bit, but I absolutely think it was the right move in hindsight. At least once a year I would hear about an outage due to an ORM gone wild, and these outages were usually prolonged by the fact that things function, and then there is finger pointing between the DB/DBAs and the app developers, etc.


for me the value of an ORM is not so much query synthesis as serialisation of object fields to db columns and vice versa


Same here.

I'd love to have a tool that just generates an object type for a given SQL query's result rows, and a function signature for its query parameters.


F# has this + static SQL query string checking at compile time using type providers


"just generates an object type for a given SQL query's result rows"

For a similar purpose in Python I generally use psycopg with namedtuple resultsets; the namedtuple 'records' do what I need for the returned data and are reasonably efficient.


Just hack that for Typescript over this weekend (: https://news.ycombinator.com/item?id=26211286


Wow, that was fast! I'm impressed.

If you want to keep going forward with it, I'd recommend working on making it clearer what the examples do, and looking at sqlc for Go for inspiration. Of the existing versions of this people replied with, that one had the clearest API, or at least clearest explanation on its landing page.


sqlc can do this: https://sqlc.dev/


This is the nicest one of these from the ones posted, I think it's the model to copy.


This!

We don't need ORM, we need OQM (object-query mapping)!


That’s a huge dependency to take on just for mapping and serializing, something that most languages provide in a standard library or base functionality of the language itself.


I like ORMs for doing the things for me like making objects and converting dates and other data to DB ready form and vice versa. I do not like them querying. The premise with ORMs is that you should be able to query pretty much anything easily. The reality is that almost all of the time you are doing something you don't have many distinct queries. You have a few reads and writes that you have to do. Things that are easily done in SQL with full expressiveness and efficiency.

So now my pattern is to use for example ActiveModel in Ruby for the models, but not ActiveRecord for the persistence part.


IMO both are required. I'm lucky in that I did a lot of plain SQL early on, and found ORMs later, but I think ORMs do cut out a lot of time for quick-and-dirty queries that end up not being the bottleneck. The problem arises once you find a bottleneck, you won't know how to optimize it if you haven't done a bit of SQL mucking about earlier.

Also, the big thing is you won't know how to translate to other ORMs if you don't know SQL. Eg once you know that you want an index, it's a matter of a web search to find out what the syntax is in your ORM. But if you just started with ORM, you might not realize that kind of thing is part of how it works.


Yeah, ORMs are specifically made for insert-heavy operations or very basic mapping of rows to objects. Analytical queries and the rest should be done with SQL.


I keep seeing some variation of: “ORMs are really intended for x” or “ORMs work best when y is true”. At some point we should take an honest look to determine if ORMs provide enough benefit to justify their existence. “I can talk to the database without knowing SQL” doesn’t cut it.


Yes they do? They are basically available in every language, in some ecosystems there are multiple alternatives as well and there would not be that big of a market unless it is useful.


if that's your bag then you can still totally do that with an ORM, hibernate for example lets you just write a whole query in raw SQL while still getting all the benefits of eliding a bunch of boilerplate field copying, having "active" objects with ORM-level update/transaction management, etc.

plus it means you don't have to write all the dumb "select * from books where bookName = :bookName" code that obviously can be handled trivially by the ORM. You just use SQL the places where it makes sense.

the ORM hate always strikes me as a little misplaced because of this - you can always write SQL where it's appropriate, in any decent ORM. And you can write bad queries in SQL too. Obviously very complex queries are maybe better reserved for raw SQL but it seems like a lot of the hate comes from maybe less experienced engineers getting in over their head with complex work items because ORMs "make it easy", and that is going to happen with raw SQL too if you throw those same engineers at those work items.

ORMs aren't inherently that heavyweight, I see people complaining here about SQLAlchemy and as a Java developer I don't have performance concerns about hibernate. That sounds to me like a Python problem and a "this specific ORM isn't performant" problem, not ORMs being bad as a whole. And if you really want a "just load the data for me and do nothing else that incurs a performance hit" approach then you can use stateless objects and it's just a wrapper around the DB to load and transform the data for you and/or do a raw, whole-object update back to the DB.


I have never seen anyone, even the ORM "experts" in my jobs, save time with an ORM over what a junior engineer couldn't do faster with plain SQL.

The reason is that sure, for your first 10 basic select queries, the ORM saved you half an hour. Then you got to that complicated join and had to resort to looking up archane syntaxes and prototyping attribute quirks for an hour, when the junior guy got the whole thing written in 15 minutes of trial/stackoverflow/error in a SQL prompt.

Then, even when the "expert" did get it working, guess who is going to be the one looking at it again when trying to figure out production support issues? The junior guy, who is now clueless and has to spend two hours to figure out what this crazy ORM mess does here. The better alternative was just to have the SQL there ready to go so it is well understood and can be ran against the production database or a test database to reproduce the issue. No questions whatsoever.

I have seen this over and over again, more than a statistically relevant number of times.


> The reason is that sure, for your first 10 basic select queries, the ORM saved you half an hour

Maybe we work in very different fields, but "basic select queries" makes up 90% of what I need to fetch from the database.

If I'm working on a forum and I want to load user 123, with all their posts, all the awards each post has, and the count of friends the user has, with Eloquent (Laravel's ORM), I could do:

`User::with('posts.awards')->withCount('friends')->findOrFail(123);`

That would return me a User model, with a collection "posts" containing a list of Post models, each with a collection "awards" of Award models, and a field `friends_count` with the number of friends. It would run three queries: one to fetch the user, one to fetch the posts, and one to fetch the awards. Depending on how I have configured my models, I can have things like dates automatically hydrated to DateTime objects.

Compare that to plain SQL queries; I would have to fetch the users, including manually writing the subquery for the friend count. Once I had those users I would then have to fetch the posts, and then again for the awards. If I want them in a hierarchy like the ORM example gives me, I then need to loop through each set of records and manually stitch them together. Not difficult, but super tedious.

Sure a complicated join is better done with as little magic as possible, but Eloquent exposes functions for adding subselects, joins, etc. in a way that just reads like SQL (and maps 1:1 underneath).


I do see the larger point though that ORM usage could still translate into problems elsewhere when you don't pull data you should, or you pull data you shouldn't, or pull data in a way you shouldn't.

But the thing is, a lot of things aren't really on the hot path and optimizing them isn't worth a ton of time. Like yeah so what if this web request that only gets used 2% of the time makes 5 extra database requests that it shouldn't, on a single data item. Not gonna tank the overall program.

I guess I'd accept that it's important to be aware of what you're pulling, regardless of whether that's automagically when a proxy object sees it needs to be lazy loaded, or explicitly in a query. Throwing junior developers on performance-critical paths is going to be a problem anywhere and on any DB access layer.


We're gradually removing our ORM usage to avoid active objects. They look too much like VSTs, but subtly make foreign network calls when you least expect it, which makes it very hard to write effective, isolated tests or replace the database layer by something else, like a service call or a cache lookup.


I’m sorry if it sounds harsh, but that is the mistake of the developers than. I think you would not allow anyone to write production code without knowing the language; it should be the same way with most libraries, especially ones having a big reach, like ORMs. Nonetheless, I have seen it countless of times in the team I worked in — so it is unfortunately really frequent.


Plain SQL with a “hydration” system to objects is quite powerful. DataMapper, I believe is the term of art?


https://github.com/charettes/django-seal Can help with these kinds of issues though I haven’t yet used it myself (I should!)


SQLAlchemy expression language is cool stuff. not an ORM but also no string-templating ...


> (usually people writing dumb code without realizing it's expensive)

Some years ago, one morning I gave a co-worker a recommendation on how to improve a loop that was unnecessarily hitting database through the Django ORM. He committed the fix that afternoon. Barely an hour later I accidentally reintroduced the exact same slowdown in the exact same loop when adding a different piece of data to it.

Soooo yeah, ORMs can be so simplistic it's too easy to do by accident even if you know exactly what's going on under the hood.


That's a nice benefit of using async ORMs (not yet available in django), the db calls are explicit!


You can also commit to always use .values() in django. That makes it error out when you access non-prefetched entities


also, values is awesome for returning your models as dicts when that's all you need most of the time, avoiding all the overhead of objects


Something about select_related? Please do share.


It wasn't anything that fancy, it's just the solution was something you usually try not to do so it just wasn't coming to mind for him. The data being looped over came from solr, and some of the fields were primary keys used in lookup tables in the database, for getting translated text. Instead of doing the lookup inline, load the entire table into a python dict before the loop (<100 rows for each of these tables) and do the lookup from the dict in the loop.

And like I said above, usually you don't just select out the entire contents of a table and handle it in the application, so I reflexively did the wrong thing as well, because of how easy it was to do with Django's ORM.


My guess is accessing a related field within a loop causing a database request per iteration, e.g.

``` [book.author.name for book in Book.objects.all()] ```


That’d be my guess: prefetch_related is great but you need to guard it with something like an assertNumQueries test to avoid accidental regressions.


Maybe I spent too much time with Django already, but if I see anyone doing anything but Book.objects.values_list('author__name', flat=True) for this type of expression, I would mark it as a 3x WTF? in the code review.


As written, it's obvious you should be doing something else like `values()` or `values_list()`. You're much more likely to fall victim to this anti-pattern if it's done within a standard for-loop that has a bunch of other stuff going on. I just wrote it as a list comprehension to avoid having to muck about with formatting on my phone.


I’d also allow prefetch_related if you’re using more than the most trivial data - no point in duplicating logic you have in your models if you have a method which generates something like a name, URL, etc. based on multiple fields.


Funnily enough, I recently optimized some code along these lines.

The way I sped it up was to call `.values()` on the query, which serializes the data into a dict and prevented me from accidentally making subsueqent calls.

PS: Indent by 4 spaces for code formatting.


s/ident/indent/


Thanks :).


ORMs give an affordability to write code faster. That may save time, but instead of saving time, you can also reinvest in better quality and performance. That's up to you and your team.

For instance, Django has prefetch_related and select_related. At almost every Django conference, there's a talk on this topic because it's so important and very underused/overlooked. But these are provided methods of the ORM.

Aside from that, there are wonderful introspection tools such as django-debug-toolbar to view the raw SQL and its performance.

It can be argued that if a solution written in Django hasn't had its database performance introspected with for instance django-debug-toolbar, then the solution isn't done. This is a small step with big rewards.

This introspection can easily identify where raw SQL is useful. But apply it late in process: As a project matures, the costs of converting some queries into raw/hybrid SQL are lower, as the statement is less likely to change. But keep these SQL statements in the models and managers, don't let them spill into views, template tags etc.


I'm increasingly banging on the drum that web frameworks shouldn't be measured in requests per second but seconds per request. It sounds really impressive to go from 100,000 to 500,000 requests per second. It's somewhat less impresssive if you consider that's going from 10 microseconds to 2 microseconds... if you consider that your real, non-benchmark handler is likely in the dozens of milliseconds.

I've got a couple of web handlers that after quite a bit of work I can legitimately claim will run on the microsecond timeframe... but they're the exception. Generally even a single DB hit across the network, even on the same system, is going to blow right past the web framework's time you're using.

On that note, using this sort of metric, japronto's claimed results smell funny. Even with a 4GHz processor, getting 1,214,440 requests per second on a single core is ~3300 cycles per request. That's less than one cycle per byte in the HTTP request for a reasonable request (with no blocking on any sort of memory request), and that's not counting the TCP itself, any response, or the overhead of switching back and forth between C and Python. I can't see how this is possible without a huge degree of corner cutting; just validating that what you've received is a legal HTTP request, correctly encoded, decoding the fields, etc. is going to eat into that pretty fast, even with all the SSE instructions you may be able to throw at it. (And to emphasize, I'm not saying this is "impossible", just that it requires a lot of corner cutting. I've also got a "web server" out in the wild that handles "web requests" blazingly fast... because it basically ignores the entire web request and shovels out a hard-coded response. Very fast. Not a very good web server.)


"Focus on making a product people wanna actually use first. If you're lucky enough to get to scale you can work about optimizing it then."

This seems like a false dichotomy. Avoiding obvious performance mistakes such as the ones you mentioned does not require additional focus that would detract from general building. It just requires that you know what you are doing.

If you are the type of person who makes said mistakes, its unlikely you would ever go back and fix them by "focusing" on performance because the issue is simply that you don't know what you don't know. Likely someone else will come along in future and point out your mistakes to you.

Optimization that actually hinders you from building and requires focus is at the very margins and almost no one is going to those levels in typical "application" code.


With Symfony back in the day, you could turn on a little toolbar that would show in the top of your rendered HTML. You could see how many SQL queries were run, what the statements were, how long they took. I think there was some other tracing information as well. This strikes me as a basic and necessary kind of testing to do when developing with an ORM. Perhaps more systematically you could get ORM business into Zipkin or Jaeger, and then have some kind of staging vs. prod or canary vs. prod statistical comparison to see if you’re about to release something dumb. Or maybe simpler to keep generated SQL in unit test assertions. You wouldn’t have to write it, but you would have to read and update it on changes, so you could notice if you were winding up with N+1 queries or a ridiculous join.


> But I agree with the author's main point which seems to be that framework performance is pretty meaningless when comparing frameworks if you're just starting on a new project. Focus on making a product people wanna actually use first. If you're lucky enough to get to scale you can work about optimizing it then.

It feels like a sensible advice but "optimization", if ever possible, can only get you so far until you need a costly refactoring or rewrite in my experience.

As projects can be very different in context, it is all about what makes a minimal implementation "viable".


> My experience doing perf optimizations... without realizing it's expensive).

Completely agree and this has been my experience ae well. To this I'll also add, inadequate thought put into data modelling. One would have to think lesser about query performance or cost of overfetching if data is modelled around the needs of the system it would serve instead of just modelling real life entries and their relationships as is, straight onto the database.


A humble request to folks making benchmark or other graphs - please understand that thin coloured lines are not easy to visually parse .. even for folks like me who aren't totally colour blind but have partial red-green colour blindness. At least, the lines can be made thicker so it is easier to make out the colours. Even better, label the lines with an arrow and what they represent.


Absolutely! I have mild red/green issues and could barely tell those lines apart.


Related to ORMs/queries/performance, I have found the following combination really good:

* aiosql[0] to write raw SQL queries and having them available as python functions (discussed in [1])

* asyncpg[2] if you are using Postgres

* Map asyncpg/aiosql results to Pydantic[3] models

* FastAPI[4]

Pydantic models become the "source of truth" inside the app, they are designed as a copy of the DB schema, then functions receive and return Pydantic models in most cases.

This stack also makes me think better about my queries and the DB design. I try to make sure each endpoint makes only a couple of queries. Each query may have multiple CTEs, but it's still only a single round-trip. That also makes you think about what to prefetch or not, maybe I want to also get the data to return if the request is OK and avoid another query.

[0] https://github.com/nackjicholson/aiosql [1] https://news.ycombinator.com/item?id=24130712 [2] https://github.com/MagicStack/asyncpg [3] https://pydantic-docs.helpmanual.io/ [4] https://fastapi.tiangolo.com/


The asyncpg library is honestly incredible. I wrote a backfill script that would: 1. dump the rows of a postgres table matching a query (usually a range on the index with a filter or two on other columns) 2. Do some very basic transformation on the rows (few replaces with small regex) 3. Take each transformed row and dump into a rabbitmq queue.

I was using aio-pika for the rabbit queue and asyncpg and was getting a consistent 25k messages/sec for like 200 lines of code.


This sounds like a really cool stack. I'm testing the waters with FastAPI and SQLAlchemy right now, but SQLAlchmey feels like it just gets in the way.

Do you have an example project which uses all of these I could look at?


I was writing a cookiecutter template for this kind of stack but never finished it. After using similar stacks for a couple of projects I am going to get back to it soon with many improvements and publish it.


Is there any automatic glue between SQL and Pydantic, or is each mapping hand-written?


It would be cool to teach the python static type checkers such as pyright and mypy to do the same validation as pydantic, marshmallow.

Then you could use dataclasses and map them to the database via sqlalchemy.

https://github.com/adsharma/dataclasses-sql

Couple of other techniques to speedup python:

* Transpile python to another language (py2many) * Compile a large graphql like query to a single query plan in python which can be accelerated. (Fquery)

Both projects on my github.


I always though it would be a nice alternative to an ORM: having a tool that take a marshmallow/pydantic/whatever model, optionally passing it additional db specific options, then it generates a bunch of sql files you can call with aiosql. The whole things would then let you optionally get the result wrapped in a model if you need to, with ORM like helpers for common CRUD things.

That would have the benefit of the standardized api of an ORM and the flexibility of SQL, without the coupling.


Absolutely! I think it should be doable, FastAPI and other libraries do heavy Pydantic model inspection, and some do automatic code generation. I should explore this a bit more.


It also has some nice properties:

- because you have all your queries written in the same place, when you create a db migration, you know where to look for code to update, and can even perform some automatically.

- sql queries could be turned into a stored procedure with a simple marking, without changing any code.

Of course the downside is that dynamically exploring data is not as easy as with an ORM.


Do you need to define your models more than once with these? I'm looking for a single source solution and haven't quite found it yet.


I only define them once, but I define all the database schema by hand (with an SQL script). I’d love to have something that translates Pydantic to an SQL schema definition.

Each asyncpg result has a dictionary-like interface, so I can convert it to a Pydantic model easily.


Thank you for sharing this stack! I'm a Pythonista at heart, recently was trying RxDB + TypeScript, and I was thinking hmm I'll bet I could do something with postgres and Pydantic.


We do something like this, also we took some inspiration from hashura. Asyncpg it's so fast an ergonomic.


That sounds great! I’ve always kept an eye on how Hasura does things. Do you have any open projects using this stack?


Don't forget that you're paying a huge price using the sqlalchemy orm - https://docs.sqlalchemy.org/en/13/faq/performance.html

If I know an endpoint is going to be hit hard, I forgo trying to use the ORM (except to maybe get the table name from the model obj so some soul can trace it's usage here in the future) and directly do an engine.execute(<raw query>). Makes a huge difference. Next optimization I do is create stored procedures on the database. Only then I start thinking about changing the framework itself.

For folks like me who want to get prototypes off the ground in hours, flask and fastapi are godsend, and if that means I have to worry about serving thousands of requests a second soon thats a happy problem for sure.


You can also use SQLAlchemy Core, which is an intermediate between the full-blown ORM and running actual strings of SQL. I've had a great experience with Core - I can easily have it output essentially the exact SQL I'd write by hand, but I get many benefits (like the ability to compose queries) that are nicer than dealing with raw SQL.


Definitely agree, I'm happy to write raw SQL, but SQLAlchemy Core is even better than that because of composability.


This is what I like most about SQLAlchemy, I can use the ORM most of the time, but in cases where it needs optimizations or aggregation queries I can drop down or refactor to core, while still using the same model/table definitions, ability to combine or filter queries dynamically and leverage IDE features like refactoring.


I'll happily forget that because it's such a small microscopic price that it's moot. You're way better off optimizing the actual query being made, which SQLAlchemy is great at because it doesn't hide the SQL from you. Don't use engine.execute(<raw query>), use SQLAlchemy Core if your endpoint is getting hammered.


To be clear, this is FUD. If you know how to make SQA emit the right SQL, the performance is basically the same as psycopg2 + your custom code, usually better. I've written many high volume SQA services and never once saw SQA per se as the bottleneck.


ORMs aren't inherently that heavyweight, as a Java developer I don't have performance concerns about hibernate.

That sounds to me like a Python problem and a "this specific ORM isn't performant" problem, not ORMs being bad as a whole. Python has never been the fastest language (it's far slower than, say, Java) and the GIL really prevents applications from scaling well without multiple instances.

And if you really want a "just load the data for me and do nothing else that incurs a performance hit" approach then you can use stateless objects and the ORM truly becomes just a wrapper around the DB to load and transform the data into an object for you and/or do a raw, whole-object update back to the DB.


Totally agree with you, only want to add that unfortunately the team doesn’t really know the given ORM they use, and I have seen some utterly stupid uses of eg. Hibernate (like eager fetching basically everything there is for queries where it is not required, or simply not knowing anything about the boundaries where an object is “attached” or not). Which one might argue that it is a defect in the tool, but I doubt you would blame an airplane for crashing when the “pilot” is not trained to drive it.


It's not just sqlalchemy but yes it is definitely a python problem. Problem or feature is up for discussion of course. The ability to mutate basically anything is powerful and you gain some tangible benifits from it.


For me, FastAPI is the sweet spot between performance and quick development.


Use of ORMs is often a performance choke point. Raw DB queries are often much, much faster. Almost always, the more you abstract, the worse you perform. It's great as a developer but not so great as a user.


I honestly would rather just read a SQL query. Almost every developer is familiar with SQL so you can immediately know what is happening vs if you are looking at a code base with an ORM you're not familiar with.


I haven't touched an ORM in over 6 years, but unless they've improved since then, I honestly can't think of a single reason why anyone would choose to use one.

They're clunky monstrosities that act only as guard-rails for inexperienced developers. Far better to invest a few days (which is realistically all you need) to improve their SQL skills and/or code-review practices.


Yes, things like setString(1, “asd”) and conversely getInt(2) are so beautiful and will never introduce mistakes /s

ORMs are there for mapping objects and well, relations. Most ORMs provide additional features, but at the core they can, and for more complicated queries they should be used with native SQL queries. They are made for OLTP not for OLAP


Like there wouldn't be anything in between /s

There are "simple ORMs" that only map results of SQL queries to objects. They do not provide a magic query API - which is the source of most problems. I don't do Python, but for .NET there is Dapper https://github.com/StackExchange/Dapper, you can have a look what I mean. You write the SQL query, explicitly execute it, the library maps the results of that query into objects (it's C#, so you have to declare the class. In Python I'd imagine it would create the object for you)


+1 for Dapper. It does have some limited "magic" query building features, but only for straightforward CRUD operations that don't involve joins. And that's a Good Thing (TM).


> I honestly can't think of a single reason why anyone would choose to use one.

In the case of SQLAlchemy (referenced in the article), many people use it as a database connection abstraction rather than an ORM. It's kinda the equivalent of "ODBC" for certain Python libraries like Pandas.

For instance, in Pandas you can write your dataframe to the database by going `df.to_sql(tblname, sqlachemycon)` where sqlalchemycon is the connection instance to any database that SQLAlchemy supports.

https://pandas.pydata.org/pandas-docs/version/0.23.4/generat...

I use SQLAlchemy for this purpose alone and write straight up SQL. I've never used the ORM parts of SQLAlchemy.


I am by no means an experienced developer--but the issue I always seem to have without using an ORM is that there are string literals containing different bits of SQL scattered all throughout my code. In addition to being hard to refactor when the database model changes, I think there is a fairly high performance cost to doing so many string concatenations every time the code runs. Is there a better way to manage the SQL command strings when you are sending the queries by hand?


That was my biggest issue with writing own SQL, until I started using PyCharm. A while ago they integrated DataGrip (I think it is only available in pro version) which makes the IDE also understand SQL code[1].

If you connect the IDE to a database it starts to recognize the SQL to the point it behaves like rest of the code (you have autocomplete etc). I am starting to think that this is the correct approach and ORMs were just a hack trying to achieve that.

[1] https://youtu.be/2bpmfjtoVVU?t=2831


A lot of times you don't need to change the SQL string at all. You just bind different parameter values before sending the same SQL string to the DB server over and over again.

The exceptions are SQL elements that cannot be (easily, or at all) parametrized, such as the column lists in SELECT, ORDER BY, GROUP BY, or changing the WHERE "shape" and so on.

In my experience, this tends to be a minority of queries, although an important minority.

----

P.S. If string concatenations are your bottleneck, then your database is screaming fast! The real-life bottlenecks are usually in excessive database round-trips and unoptimized query plans, and are orders-of-magnitude larger.


I haven't done DB stuff in a while as I've mostly been frontend, but I reckon the way I'd lay it out is in the same way that I have a "clients" or "services" folder (or repo) which contains things that return Promise<Data> (and I don't have to care whether their source is HTTP, Firebase, or anything really). I would probably do the same with my back-end application (or lambda). Directories (or repo) full of services which are sets of high level calls (e.g. getPotatoes()) which are async functions that return data. Inside would be (probably) SQL.

> I think there is a fairly high performance cost to doing so many string concatenations every time the code runs

String concatenation is extremely cheap compared to any sort of IO or computation.


I disagree; not using ORMs isn't going to magically make developers write better queries, why not spend those few days training them to use the ORM better? Would you rather have raw SQL strewn about the codebase and have to worry about input validation and data (de)serialization every single place? Maybe it's ok for toy apps, but I wouldn't want developers bringing their own different styles of writing SQL all over a project. An ORM helps standardise this stuff


Good article, but I can't help but notice a gaping hole in the benchmark -- why was there no attempt to run gunicorn in multi-threaded mode?

The article has a link to https://techspot.zzzeek.org/2015/02/15/asynchronous-python-a..., but failed to mention the key takeaway from the article:

> threaded code got the job done much faster than asyncio in every case


The article explicitly says they were testing single threaded use cases.


But why? The results would be a lot more useful if they include a single-process multi-threaded setup in the benchmark.


In my benchmark testing, SSL appears to be the bottleneck; e.g., Apache vs. Nginx does not really matter. I assume the benchmarks above 10,000 RPS are not using SSL and regular HTTP? How are people doing benchmarks at 10k-100k RPS?


SSL handshaking is definitely a bottleneck. I can’t speak for the benchmarks but with SSL typically this scales via using persistent connections (keep alive), and SSL session resumption (much lower cost to setup vs a full handshake).

In the end you want to measure perf with and without SSL so you can identify the real bottleneck. Otherwise you might just benchmark your SSL implementations handshake performance and not what you really want.

As an example, I found that Kubernetes nginx-ingress can’t cache SSL sessions on the upstream side (nginx to your app). So request bursts can really hurt your application unless your pool has enough open connections to handle the burst (keep alive and keep alive requests). Without benchmarking my app in different ways I wouldn’t have figured this out as easily and might have just assumed my app was slow.


HTTP/1.1 Keep-Alive helps a lot. HTTP/2 is even better. You only do the SSL handshake once.


Flipping the HTTP/2 switch was amazing for me -- I have a page that for reasons needs to load ~500 small images. Initially I was worried about having to figure out a sprite-based method to compile the images (which is not ideal because there are many permutations of which 500 images), but when I turned on HTTP/2, the overhead just disappeared. The images load instantly. I'm nowhere near the multi-k RPS metrics as above, but it was night and day for me even for individual requests.


Now web-developers just need to realize that much of the value in using CDN's is gone. Earlier years it made sense to spread content over multiple domains, like a CDN domain for static content. Now due to HTTP2, as much as possible should be served on the same domain to get the full HTTP2 effect.


Where is SSL the bottleneck? Wondering if terminating earlier and just relying on HTTP after would help.


Signing, verification, and key exchange are quite expensive. ECC doesn't help server-side; it mostly reduces client-side verification costs. Session caching can be an important optimization that can significantly reduce those costs, but scaling session caching has its own problems.

But IME real-world bottlenecks have more to do with overall architecture. People tend to heavily focus on technical details, such as concurrency architecture--the how. But the biggest opportunities for improved performance usually involve functional aspects of an architecture--the what. (Note that these aren't fixed categories; they're relative positions. A technical detail often becomes a functional model as development progresses.)

12 RPS is a long way from implicating SSL. If you get to the point where SSL is an identifiable bottleneck, you've either made a series of tremendously good decisions or exceptionally poor decisions.


> Signing, verification, and key exchange are quite expensive. ECC doesn't help server-side; it mostly reduces client-side verification costs.

I don't think that's right. Cloudflare's blog [1] says they can do about 9.5x the handshakes/sec with ECDSA at 256-bits vs RSA at 2048. Verification for ECDSA signatures are somewhat slower, but it's usually an acceptable tradeoff to make clients do a bit more work so that servers do a lot less.

I agree though, at 12 RPS, TLS isn't the bottleneck.

[1] https://blog.cloudflare.com/ecdsa-the-digital-signature-algo...


Ah, you're right. 1) I got it backwards, verification is much faster with RSA, so RSA is better for clients). 2) I was testing libressl (macOS, OpenBSD), where signing speeds between rsa2048 and ecdsap256 are nearly identical (Core i5, M1, AMD GX-412T), whereas with OpenSSL (AMD EPYC) ecdsap256 is faster (30x advantage to ecdsap256 actually, as compared to 4x verification advantage to RSA). Though, the magnitudes here seem to be sensitive to optimization effort.


HTTP/3 is 0-RTT. You only need to connect once, also UDP


As a Django shop, we’ve always hoped PyPy would one day be suitable for our production deployments but in the end with various issues we were never able to make the switch.

And then Pyston was re-released...and changed everything. It was drop in compatible for us and we saw a 50% drop in latencies.

Source availability aside, I suggest anyone running CPython in prod take a look.


Can you tell us more about that?

I remember pyston v1 from Dropbox. You are speaking about v2, which is a binary package (closed-source at the moment)?


Yeah it's a closed source binary.

We're very happy with it. Great compatibility, no horrendous warm up times, and very meaningful speedups.

Not much more to say. It's the same, just a bit faster.


When you start hitting bottlenecks in your python web framework its probably time to switch to a faster language, not another framework in python.

You're probably done rapid prototyping by this point anyway.


I don't think there is much gains to rewrite everything in a faster language. Unless they are a very very successful company with billions of customers, it's often cheaper to scale horizontally.


Depends on if you identify it early enough really and even then halving your app tier costs could be sizable for any company. If you can get some productivity benefits at the same time from say static typing then there's even more reason to switch.


There's still a good reason to pick fast frameworks in a slow language: you can delay the inevitable for a bit, probably enough time for you to work on a rewrite or whatever.


Well if you convert everything from ORM to raw SQL, it will then be easier to extract all of that SQL and use it in a different web framework once you've measured and confirmed that your bottleneck is servicing requests in Python.


Maybe I am missing something, but why wasn't sanic tested with pypy? I expect that this combination would outperform everything else.


Was wondering the same thing. PyPy can do wonders for other asyncio-based frameworks.


Why Python at all? About 10 years ago I liked Python a lot (and still like it in principle) and felt very productive compared to, say, Java. Java was full of inconvenience, XML, bloated frameworks and all that. But today you can use Kotlin, that is in my opinion even nicer than Python, with performant frameworks (e. g. Quarkus or Ktor) on the super fast JVM.

I don't want to start a language war, but maybe Python is not the first choice for their requirements.


I can code comfortably in Python, Java, JavaScript and to some extent C/C++. In the last 4 years I have been using mostly Python for various reasons (Machine Learning, OS automation, web scraping ...). Compared to the other languages, Python feels lighter and faster to write to the extent that it rarely interrupts my flow of thoughts. Now and then I have to code in JavaScript (frontend), C/C++ (embedded / low level optimizations) and Java (maintenance) and my feelings are:

- JavaScript still feels messy

- C/C++ is complex, but it is often offset by the complexity/needs of the project (e.g. in embedded)

- Java has become kind of bloated with all the new stuff

So as others have already mentioned, development productivity is in many cases far more important than speed of code. In my ~20y long career I have rarely seen a project failure due to runtime performance. Most of them failed due to speed/agility of development iterations and also project/product management issues (bad fit, unrealistic project plan, lack of focus and customer feedback).

That said, if I need to look for another language due to performance, that would probably be Rust.


I agree with your general take on developer productivity, but I don't feel that modern JS is significantly messier than Python, at least not to a level where it significantly impacts productivity (I'd rather avoid a debate on the abyssal depths of the language, eg, type coercion)

I feel about the same amount of grievances with both. For instance I dislike Python's async and functional semantics (list(map(lambda n...). But it has a much better standard library overall, a lot of the scripting syntax semantics (file opening, requests, etc.) feel cleaner, etc.

I'm more versed in JS but as the knowledge curves converge in months or years to come I don't think I'll be significantly more productive in Python.

I know you were expressing an entirely subjective opinion, probably contingent on the amount of day-to-day practice you have with each language (you say you code JS in the front-end now and then... I write a lot of it), but I still wanted to offer my counterargument.


> list(map(lambda n...

List comprehensions are much better for this. Functional doesn't mean you have to use a function call. If you can use the paradigm with literal syntax, just do so.


I never said functional means you necessarily have to use a function call, and I also understand list comprehensions are semantically functional, but sometimes you do need to use a function call.


They don't really compose nicely. (At least from my point of view, as I prefer fluent interfaces, eg. those you usually find in Rust/Scala/Java.)


Thanks for the JS perspctive. Since my frontend work mostly involves hacking together prototypes and trying to integrate some JS libs in as quickly as possible, I guess I am kind of biased.


JS feels very mess to me too. I think the main reason is it’s so variable - things like modules - whereas Python everything is mostly just so consistently... pythonic.


I also notice I spend less time thinking "how do I do X in this language" with Python and just do it. Maybe not the best implementation of X, but that doesn't really matter because I implemented X, Y and Z while someone else got bogged down in their "better" language and barely got X across the finish line.


I agree - if you have any chance at the slightest volume (and based on this we're talking 100 concurrent users, not a million) then you are handicapping yourself if you're stuck on a single thread.

Personally I would suggest Elixir (and the Phoenix web framework). It's fully 'parallel by default', will happily saturate all your cores and serve tens of thousands at the same time. The language is simple, the framework is good, although obviously not as many people know Elixir as Python.


I feel like when you need to create value and make a product, better to do it in a language you know. Product value is not directly proportional to product performance or raw HTTP request serving time.

I for instance know a bit of F#/C# and Java, but would probably pick Python to make a new product just to remove that mental barrier of not having my lack of language knowledge in the way of things


Might as well refer to TechEmpower benchmarks.

https://www.techempower.com/benchmarks/


just.js (#9) looks interesting

https://github.com/just-js/just

It seems to be a much tinier JavaScript runtime than Node.js (still using v8), but linux only

The benchmark is probably unrealistically optimized code but even so, it implies Node.js itself has a large performance overhead


If you're not paying attention it would look that way, yes.

If you pay attention you'll notice that just.js is using postgres as their DB, while all of the node benchmarks are handicapped by either using mongoose/mongodb or MySQL.

There is no node benchmark with postgres, but all of the fastest benchmarks used it.

"lithium" is a good example to show how much of an impact switching to postgres has. All of the 4 lithium benchmarks are identical except in what DB they use.

The results are: lithium-postgres-batch (#2 - 659850), lithium-postgres-beta (#13 - 398773), lithium-postgres (#14 - 398258), lithium (#45 - 271989). The last result is MySQL.


Somehow just.js is faster than Rust and C++ in 20-queries benchmark, they're all using postgres. https://www.techempower.com/benchmarks/#section=data-r20&hw=...


It's likely because of this (custom Postgres client, request pipelining and not doing sync/commit after each query):

> [Multiple Queries] This is the first test where Just(js) has quite a big lead. This is likely due to the fact it is using a custom postgres client written in Javascript and taking full advantage of pipelining of requests. It also avoids sending a Sync/Commit on every query. As far as I am aware this is within the rules but will be happy to make changes to sync on every query if it is not.

https://just.billywhizz.io/blog/on-javascript-performance-01...


What makes postgres special in these cases?


I would like to know as well. All I know is that for this specific workload there appears to be a strong correlation between the chosen DB and performance. Depending on the benchmark you're looking at on that site, the top 30-60 results are exclusively postgres.

Further, for every framework/language that was tested with both MySQL and postgres (there's quite a few of them), the postgres one always ranks higher.


"In this test, the framework's ORM is used to fetch all rows from a database table containing an unknown number of Unix fortune cookie messages (the table has 12 rows, but the code cannot have foreknowledge of the table's size)."

This is the "workload".


> the top 30-60 results are exclusively postgres.

Perhaps it's because pg had better async drivers?


Likely it's the better/faster support for multi-step transactions. (That's a big point for PG, since at least 2004)



We did an evaluation for our API. The API accepts an image upload, passes it onto the backend for processing and returns a ~2k json lump in return.

Long story short, fastapi was much much faster than anything else for us. It also felt a bit like flask. The integration with pydantic for validating dataclasses on the fly was also great.


I would question choosing Python for large server projects because the performance ceiling is so low. At least with the "middle tier" performance languages such as Java / C# you are unlikely to require a complete language switch as the project scales.


With cloud and infinite horizontal scaling...hitting that performance ceiling is an indication of a poor design.


This is false, since it is not always easy (or even possible) to horizontally scale a work-load.


If you've made it impossible or just difficult to scale horizontally in the server world, yeah, you've got a bad design.


Sorry to keep coming back, I don't want to start a flame war here. Difficulty of horizontal scaling is a property of two things:

  1. how your solution is designed (your point, and I agree this is often done poorly)  
  2. the problem / work-load you are trying to scale (my point). 
It might be that your problem does not "shard" very easily. You cannot fix this with solution architecture, at least not easily. Horizontal scaling of a relational database is very difficult, for example.

Edit:

Another example. Can you rewrite NGINX in a slower language and use horizontal scaling to fix it? Of course not, because that horizontal scaling would itself leverage NGINX (or something like it)!


There are lots of ways to speed python if need be, and you can recover much of the work already done. Cython is another.


I inherited a flask queue worker, and it suffers from some major problems (like 12 req/second when it's not discarding items from the queue). I am primarily a javascript programmer so I'm a little bit out of my element.

I am tempted to refactor the worker to use async features, and that would require factoring out uWSGI, which is fine, I only added it last week. The article states that Vibora is a drop in replacement for flask, but I guess I'm a bit skeptical, as I can't find much information outside of Vibora having a similar api. For a web service with basically one endpoint, I could refactor to another implementation fairly easily, I'm just looking for the right direction.

I thought maybe I should refactor the arch to either batch requests to the worker, or to use async. Anyone have a feeling where I should go? I am just getting started researching this, but any advice would be appreciated.

Edit: at least quart has a migration page.. probably will just try it out, what can I lose? https://pgjones.gitlab.io/quart/how_to_guides/flask_migratio...

Second edit: Also might try out polyrand's stack in the comments.



The fact that you are using offset of 50000 and complaining it slows everything down says a lot about the benchmarks. Top it all with ORM query with prefetch all, GIL, and shared CPU (I am guessing) that you used to run benchmark on. You see where this is headed?


I have great experiences with Falcon for backend REST APIs, and it is supposed to be great in terms of requests per second.

How does it compare to Sanic?


I'm a big Falcon user as well. Got into it because it was easier to get up and running without bells and whistles vs the more "popular" frameworks.


> I have great experiences with Falcon

Then keep using Falcon! OP’s core thesis is that developer comfort matters more when choosing an API framework because the performance bottleneck is usually found elsewhere. The API itself should be fast enough with some combination of Gunicorn, gevent, PyPy, and horizontal scaling.


I just wanted to know how it would compare with Sanic, because I never used it.


The biggest difference is that Falcon is synchronous while Sanic is asynchronous. With Sanic, you would explicitly specify async/await for asynchronous operations and use asynchronous libraries for I/O. Switching to Sanic could also affect how you deploy to production. Both are plenty fast.

FastAPI [1] is also worth considering if you’re looking into asynchronous API frameworks. It comes with nice features for specifying API schemas.

[1]: https://fastapi.tiangolo.com/


I prefer async big time. We once had to implement async routines while using flask, where the server would return a 200 but keep processing the request, and the actual result would be sent by email. It was hellish and inefficient to make it. In hindsight would have been better to use a queue service and a consumer and decouple the whole process, even if it meant increased infrastructure and maintenance complexity.


C#/ASP.NET is the fastest web framework now:

https://www.techempower.com/benchmarks/#section=test&runid=8...

7.000.000 requests per second

Even GO can only achieve 4.500.000 million requests per secnod being a low-level language, in opposite to high-level C#.


That 7 million requests per second is achieved by writing a hard coded plain text HTTP response string directly to the client... it is so far disconnected from any real world use case that the number is basically meaningless.


It's still 8th in the composite benchmark. And the criticism you're leveling would affect the entire benchmark design, rather than a particular framework score, no?

[1] https://www.techempower.com/benchmarks/#section=data-r20&hw=...


> And the criticism you're leveling would affect the entire benchmark design, rather than a particular framework score, no?

Indeed. The problem is that many of the scores in the top 100 are really misleading because no one building a web app would implement things that way. There is still some value in the lower down benchmarks but you have to basically read the underlying source to determine if the implementation is remotely realistic or not. For starters I would ignore anything classified as "platform" which is described as:

    Platform, meaning a raw server (not actually a framework at all). Good luck! You're going to need it.
For C# in particular I would only consider the mvc variants as realistic.

Edit: I looked into the "asp.net core" composite score a bit more, it looks like those benchmarks are based on the aforementioned "platform" implementations for each test. I actually think this score is even more misleading than the individual benchmarks. At least the individual benchmarks show you the difference between "aspcore" (platform), "aspcore-mw" (middleware-only), and "aspcore-mvc" (full framework with routing).

Here are recalculated composite scores based on the more realistic implementations (aspcore-mvc, aspcore-mvc-ado-pg, aspcore-mvc-dap-pg, aspcore-mvc-ef-pg):

    ASP.NET Core MVC with ADO.NET (raw SQL): 3029
    ASP.NET Core MVC with Dapper: 2591
    ASP.NET Core MVC with Entity Framework: 2195
Compared to Flask's 468 or Django's 280 it's still significantly faster, but not to the same extreme you might think at first glance at the chart.


Yea I've seen that they do provide ORM benchmarks in the detailed tests. I still find it very impressive - as a full-featured framework it ranks well above the rest of its peers, despite the ORM.

I am writing Django backends nowadays, but like I've mentioned elsewhere, I love its full-batteries approach and maturity, but I'm not very happy about it's meager async capabilities, and I think I'd prefer something with strong typing...

I've always been put off by Microsoft's lock-in but it seems that's changed so at least I'd put the .net as a contender for side project in the near future.


Yes but getting a good ranking at a useless benchmark if not very meaningful. It's cool, but I wouldn't use it to make any serious decision.


C# is only at number 3 in your list. Both Java and Rust are above it in the list.

It's also a very "artificial" benchmark and real world code will give different results (if you have static content, just put it in a CDN and don't worry)

Other benchmarks from the same site:

- JSON Serialization: C# is number 34

- Single query: C# is number 23

- Fortunes: C# is number 7


It needs to be emphasized how artificial these benchmarks really are. Here is the source for the Fortunes C# benchmark:

https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast...

There's no routing or templating, it just writes a bunch of strings. No one would build an actual web app this way.

The only C# benchmarks that are remotely realistic are the mvc variants, starting with aspcore-mvc-ado-pg at number 79.


https://github.com/TechEmpower/FrameworkBenchmarks/wiki/Proj...

Routing is required, but generally the rules allow things to be "reasonable" and "acceptable", which lets all these weird implementations through.

Honestly, they should remove the "Implementation approach" column, because basically every implementation is marked as "realistic", making it meaningless.


The "routing" implementation is more hard coded string matching:

https://github.com/TechEmpower/FrameworkBenchmarks/blob/5b0e...

I mean I guess technically that's routing but it's not remotely realistic. Only the mvc variants use the actual framework's routing system.


It ranks 8 in the composite (which adds a layer of human opinion, since they weigh each of the benchmarks...)

Java and Rust are both above, but my takeaway is that it seems to be by far the best-performing batteries-included framework according to the benchmark. It leaves the likes of Rails, Laravel, Django, Phoenix, Spring or Nest.js in the dust.

Is there something in the benchmark that favors .net core above all the others?

I'm genuinely asking, I've never even tried .net but I like full-batteries frameworks and this catches my attention.

[1] https://www.techempower.com/benchmarks/#section=data-r20&hw=...


The #1 project on that list appears to be implemented in rust? What does GO being "low-level" have to do with performance of serving requests? I'd imagine its bottlenecks would be due to something fairly arbitrary, like its garbage collection or how it represents strings or something.


Why do enyone needs framework to print zillion Hello Worlds to the client?

It's more interesting to see results of high-load DB tests, for example:

https://www.techempower.com/benchmarks/#section=data-r20&hw=...


Depends on your use case. I've written quite a few backends that didn't use a database and there are a number of cases for the pure HTTP benchmark. Pass through proxies with injected behaviour, in memory key-value data storage APIs, memory mapped files, etc. IMO not every shop offloads the state to its DB. Once you add the database you are really testing the DB driver - and that adds a lot more variance to the test. Maybe its just the DB driver for that particular DB?

A fast "Hello World" benchmark implies that the HTTP/transport layers of the framework are very fast. That's your base and the lower bound to your best performance potential. As a real world example if .NET ASP NET Core has the best request/response benchmark and its better than say nginx (a popular reverse proxy) it might be better to have all your gateways using a reverse proxy implemented with that as its base instead. Over your whole network depending on your scale that could be a big cost and latency saving measure. I wouldn't be surprised if Microsoft or it's community have started writing one.


Hmm no they have the same speed with gnet ( Go ): https://www.techempower.com/benchmarks/#hw=ph&test=plaintext 7M req/sec both of them.

( all those benchmarks are sort of useless anyway )


Go being low-level has nothing to do with performance. They deliberately keep feature set on this level, they're not making some kind of trade-off.

Also, high performance C# is so low level that you might as well write C++. Or you think you will use EF, LINQ and have 7 millions rps?


> Or you think you will use EF, LINQ and have 7 millions rps?

I'm imagining not, but it's still comparatively faster than most, if not all, "full-batteries frameworks", right?

I'm assuming the use of ORMs and such is more or less uniform in the comparison (eg, if they don't use EF for .net, they don't use Django ORM either)

The overall performance of .net across these benchmarks really catches my attention like it does GP...

I'm looking for a full-batteries framework based on a strongly typed language and never in my life I thought I'd say this, but it might be time to give .net / C# a whirl?

I've heard really good things about the dev experience from people here on HN, F# is a really cool bonus, and the fact that it looks at least comparatively performant could be the icing on the cake.


A framework that you can stay high level if you want and optimise when the rare but exceptional case arises on a particular route without splitting the process/resorting to C++ interop (with its own performance issues), or the risk of needing to rewrite due to performance to me is the big selling point. In a previous life I got some significant latency benefits by doing just that. I think its probably only gotten better with .NET 5.


The important thing to remember is that unless you're running a massive service, requests per second is less important than seconds per request.

Getting an API hit from 300ms to 70ms, and proper frontend caching is far more valuable than concurrency (if you can afford to throw servers at it) because it actually affects user performance.


Since I've been a developer there have been two changes that I feel have given major performance improvements and made backend framework improvements much less significant (atleast in the apps I develop): CDNs and client side rendering (that means the more, smaller requests for data which are more suited to be served via a CDN)

Using (for example) AWS Cloudfront was a gamechanger in how I design webapps and view performance. Being able to 'slice and dice' what requests get SSL terminated at the CDN, cached fairly locally, served from an Amazon managed webserver, or sent to our app server, increased our performance 10 fold.

That approach isn't always practical, but I find that it's now much easier to choose the backend for developer performance and doubling the server CPU/memory is quicker and cheaper when needed.


Not that it matters any more, but a colleague mentions Flask was originally a joke of what not to do:

https://lucumr.pocoo.org/2010/4/3/april-1st-post-mortem/

Flask author reflects on that here:

http://mitsuhiko.pocoo.org/flask-pycon-2011.pdf

Quite relevant to the conclusion in the article.


And here I was living under the assumption that psycopg2 was the only option, and probably the biggest reason I was not using pypy. Gotta take a look at pg8000.

In general, I've always liked the idea of pypy, so I'll try to use it more, and not just for performance. Will also donate when I can.


I always assumed Python could scale because of Reddit: https://github.com/reddit-archive/reddit

Not quite sure if their current site's code is opensource... anyone know?


Any language can scale. Pretty much most languages there can handle scale. It's usually bad algorithms or external services be it API & DB used poorly that impacts performance.


Facebook uses/used PHP, that scales too. If you make it stateless and use it only as a glue. Dropbox uses Python too. Eve Online used Stackless Python.

Of course Reddit/FB/EVE/Dropbox shards everything, there's no global state to manage via Python. The state lives in the data store layer.

And for that there are these monstrous/elegant things like Vitess, that YouTube used (uses?): https://vitess.io/docs/overview/architecture/ which is basically a sharding/routing layer on top of independent MySQL instances.



TLDR - pypy is awesome. Dont use frameworks. Use pypy.

Please donate. Pypy needs funds - https://opencollective.com/pypy

Pypy doesnt get a fraction of the funding that python does.


Interesting. I never heard of Japronto before. For the people working with Python: why Flask instead of Japronto?


The ecosystem.

You got loads of libraries built around flask, and a lot of doc and tutorials, not to mention how much battle tested it is.

Those can matter much more than hypothetical perfs in a synthetic benchmark.


I get much higher request throughput in my Tornado applications, with very low response latencies, strange.


This benchmark was run on a laptop, which has a very small number of cores compared to the servers that usually run such apps. The author doesn’t mention any attempt to tweak the number of workers, which would make sense in this case. Given that they did notice at some point that CPU usage is lower than expected, I am surprised that they did not try it.


Where is japronto on the techempower? It's not even on there.


My god, the CSS and styling on that page is absolutely abysmal


Why is it so? I've got 100K requests per sec with PHP easily [1]

[1] https://github.com/gotzmann/comet


"Blazing fast with 100K HTTP requests per second and ~0.1 ms latency on commodity cloud hardware"

Ok what's the secret sauce and downsides? Because as far as I know there's no way to cheat with PHP like C# did here:

https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast...


The most important part is Workerman and its efficient network arhitecture based on libevent. The other part is efficient DB drivers. Thats why Comet ranks much higher than Go / NodeJS / Python frameworks in DB higloaded test:

https://www.techempower.com/benchmarks/#section=data-r20&hw=...


The TL;DR you should be looking for:

> all of this emphasises the fact that unless you have some super-niche use-case in mind, it's actually a better idea to choose your framework based upon ergonomics and features, rather than speed.


I think it’s been bog standard practice to run flask via uwsgi or gunicorn with async workers and use multiple process based workers per deployed server unit (eg per pod in Kubernetes).

What matters is that the cumulative latency & throughput solve your problem, not how fast you can make one singular async worker thread.

I figure most people running complex web services in production would just do an eye roll at this post. Nobody's going to switch to PyPy for any of this.

My team at work runs several complex ML workloads, and we use the exact same container pattern for every service running gunicorn to spawn X async workers per pod and then scale pods per service to meet throughput requirements. Sometimes we also just post complex image processing workloads to a queue and batch them to GPU processor workers. In all these use cases, super low effort “just toss it in gunicorn running flask” has worked without issue for services supporting up to peak load of thousands to hundreds of thousands of requests per second.


Could you share your company’s website?


No, I can’t speak on their behalf on Hacker News, so it is important to me to stay disconnected from my employer.


I have to question the value of text written by someone who sets white-on-white text in their website...


The background is blue?


The background image is blue, the background itself is white.

Without loading the image, the text is the same color.


I question the value of the comment of someone who loads css but not background images.


It's a bit of a step back in time reading things like this.

This is stateless HTTP requests hitting a relational database. How is this dead horse still being beaten? The patterns for load balancing, horizontal scalability, caching in this space well documented.

What are we gaining still profiling Django, Flask and Ruby on Rails in 2021.


I suppose every app you work on runs the same query repeatedly? Yes, load balancing makes sense, but the author is specifically looking at requests per thread, i.e., wouldn't it be great (and more cost effective) to get as much throughput from a single thread as possible?


Does this mean you work with a stateful websockets setup? What stack?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: