Brian from TechEmpower here. I don't want to take away from these results, but I do want to provide an important piece of context. This link is to a rendering of data from a continuous run [1] that hasn't received the type of sanity checking we do for an official TFB round. You can tell from the title ("Test") and the UUID of the run visible in the gray box under the navigation. Based on this link's attention, we'll make a more prominent warning for renderings of continuous runs so that they are more obvious to readers not familiar with the project.
We execute runs like this continuously to allow maintainers of test implementations to observe the results of their contributions. Given the performance seen here, it is very likely that Lithium will be well ranked in the next official round. But we feel the ranking seen in a continuous run such as this should be taken with a grain of salt until that next official round is available.
I had investigated what explained the huge gap between the few fastest frameworks vs the rest.
The answer is a deceptive one, they didn't achieve revolutionary optimizations.
The thing is: on many of those benchmarcks, the bottleneck obviously is the DB.
The ability to do DB queries asynchronously and with batching is the differentiating factor.
Only postrgresql can support such a feature but you need support on the postrgresql client too.
The official C postrgresql client used everywhere does not support said feature except with a patch from 2016.
Yes the secret of drogon (and probably of Lithium) is that they use a fork of libpq from 2016 because upstream can't agree on merging the patch and nobody is working on upstreaming it.
Actix web benefited from the feature because their client tokio-postgres is a reimplementation and does not use libpq.
The industry grade server ecosystem that is the JVM use the jdbc which sadly has a blocking socket thus not allowing asynchronicty.
But when loom arrive every jdbc existing code will magically, automatically become truly asynchronous such spring should come on the top 4 place.
There is also a wrapper of the jdbc through kotlin coroutines and there are reactive jdbc implementations such as R2DBC. It is unclear as of today if such solution enable postrgresql async queries and batch processing. It seems that nobody has tried those on TechEmpowerUp which is sad.
Finally one could use libpq over JNI.
Edit: I have read that the next release of pgjdbc (43) will switch from std socket to the NIO non blocking socket.
What should be heuristically the fastest HTTP framework (H2O, in C) has refused to use the old libpq fork because the api is not stable and thus not production grade.
Indeed, sql query batching is the key (and I personally think that the techempower website should make it explicit), async communication with the db and the http client is also the key.
FYI lithium is not using a libpq fork from 2016, it has been rebased on master 1 month ago. but yes Lithium (the sql-pipeline branch) is using it. (I mailed the drogon maintainer to update it aswell).
About H2O, the non batched version of lithium is as fast (slighly faster on some tests, slighly slower on others), while being much simpler to use (implementation of TFB of H20 is 4400 lines vs 250 lines for lithium...)
Interesting, thanks for the answer.
What I would really like to see is the obligation to have in the name "Lithium-with_batch" and a version "Lithium" that would be without it. It would show the impact of other optimizations which are currently hidden by the paradigm shift.
I added -pipeline but I guess -with-batch is more explicit. I'll change it then. But anyway the techempower team will eventually add a special tag for batching. There is actually a version of lithium not using batching, check for 'lithium-postgres' in the benchmark tabs other than 'composite score'
Other big optimizations are
- non blocking communications between the database and the framework
- non blocking communication between the http client and the framework
This is for C++, for other slower interpreted languages: calling C bindings is a big optimization aswell (and this is how some php frameworks get good performances for example).
Thanks. Asynchronocity is one thing but there is one thing that in theory can achieve better performance than asynchronicity alone: I'm referring to the reactive programming paradigm which benefit from the concept of backpressure.
If you've heard of it, do you think it could achieve even better performance?
https://medium.com/@jayphelps/backpressure-explained-the-flo...
I had a look at the video, but I don't think there is backpresure in this benchmark since there is only 512 connections max and each connections wait for the server response before sending a new request. So in other words the load generator never send more request/s that the server is able to handle. (tell me if I missunderstood backpressure)
Mmh then I wonder if the techempowerUp benchmark suite would benefit from a new benchmarck that is backpressure sensitive, thus increasing its coverage of real world workloads?
yes, it has been discused already to increase the number of connections. But still, several thousands of simultaneous connections is still ok for the best performing frameworks...
> But when loom arrive every jdbc existing code will magically, automatically become truly asynchronous such spring should come on the top 4 place.
I know less than zero about Java and Loom, but if I understand correctly loom enables first class continuation and seamless M:N threading.
But I do not see how that will help jdbc, (unless you have tens of thousands of db sessions); I understand that the power of the new libpq is the ability to pipeline queries and that won't magically appear without development effort.
I'm not an expert but I believe this would not allow batching, only pipelineing and to a limited extent as spawning OS thread is "slow" and by default I doubt they spawn more threads than CPU number * 2?
I want to approach it with the same level of polite humility you've offered. Presumably you could use a thread pool proportional to the number of connections in your connection pool?
4/6 tests in there account for database requests. So yes, I would say that Tech empower is pretty important to gauge the performance and efficiency of web frameworks.
Personally, it helped me realize that my mistaken belief that nodejs was hella fast and that old Java based frameworks are slow. It also introduced me to Kotlin (I started on it to try Kooby) and overall love that I can at a glance see that for my chosen set of languages, which farmeworks-dbs are a good bet.
The problem is they account for _simple_ database requests.
Which surely are common but long(er) running database requests or delegation to another service aren't super rare either.
Similar some of the frameworks might have major reliability problems which can randomly "crop up" due to thread pinning and similar.
I believe it should have some form of "chaos" test where instead of interacting with the DB it interacts with a another test server on both ends and the front and back connected server run through long random "chaos" simulation (and another version where a pre-recorded run is used for reproducibility).
The benchmark show that the frameworks has a big impact on performance. While all the frameworks are using the same database server, performances varies from 1x to 100x.
While the benchmark focus on more req/s, we can also see it the other way around: for the same req/s, you can use smaller hardware, i.e. save money and energy.
In a typical web app you probably wouldn't notice any difference but in the niche where you're handling millions of reqs/s even slight improvements there could possibly translate to a non-trivial amount of money savings. On the other hand if you've already scaled to such a degree then those savings wouldn't matter that much. So, what's left is maybe for educational or research purposes, it's good to have an idea where the upper bound limit is while still maintaining a decent API, so there's value there, it's a great feat regardless.
At millions of reqs per second the thin contract of the internals of http request/response will never be the layer that will have any fruit left to pick to make gains.
You’re making a statement here that is only true if you assume the developers of such a service have optimized their framework. If instead they are using one of the slower frameworks, they will pay proportionally — you can’t assume that just because they take Mrps that the service will be super complex past http. Often the exact opposite is true and they get to that scale by keeping the service very, very simple. Http processing becomes -more- of their latency budget in such cases.
Yes, and the top frameworks are already bandwidth limited on the simplest benchmarks. The big gains are now made in DB access, which is a much more complicated topic than "ORM is bad for performance".
It's also a question of what's going on when you dive in. Out of curiosity, I've looked into several parts of benchmark source code before and it's questionable especially where the database is concerned. You'll see some things that seem to go against their own rules, like presorting and batching inserts for efficiency but their rules seem to indicate that it's not allowed.
It has lead me to treat these benchmarks with a grain of salt.
There are very few circumstances that I can imagine that would send me running for C++ for a web project. I'm much more concerned about balancing efficiency of the developer environment with performance, security and maintainability, than pure performance. At the point that I'm not, my priorities are out of whack (aside from very niche situations). The side effect of these benchmarks, while we all love benchmarks, is the idea that any of these faster frameworks are actually a better choice because of a benchmark result.
My motivation in writing yet another web framework was to get pure performance while providing maintainability of more dynamic languages. Check the implementation of lithium, it is one of the most concise implementation of TFB: https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast...
Django is already vastly faster than human perception. An entire roundtrip to my Django app takes 70 - 110ms. The entire roundtrip is already faster than human perception even in the worst case, and of that Django itself probably contributes something like 2 or 3ms.
110 ms is not faster than human perception. It's over a tenth of a second, and is very noticeable. The flicker fusion threshold for human vision lies in the 60Hz-90Hz range, corresponding to between 11 and 17ms. If I recall correctly, EEG experiments show differences up to around 100-120Hz, so on the order of 8ms. In any case, network latency itself adds on the order of 10ms per 1,000km of distance; 35ms or so for a London to New York round-trip, so the point about framework speed being a bit of a moot point other than in exceptionally high traffic scenarios still stands.
OK yes, it a literal sense you can see things that happen in less than 100ms. What I mean is that you can't tell whether the nominal amount of flicker on page load is from the time it takes to paint the DOM or from the network request. Unless you were asking a professional front end developer, I don't think most people would even have any intuition about what was happening.
>and of that Django itself probably contributes something like 2 or 3ms.
Unless it is super tuned, running on a specialise VM, I seriously have doubt about that. It is already pretty damn good if PHP/Python Django and Ruby Rails could do 10ms request excluding the Database on a moderately large App. Generally they are 20ms+.
And recently shopify even have their performance matrix allowing for 300ms per request.
At this point it's mostly about (cost) efficiency, like how many EC2 instances you will need to cover your workload with a service implemented in a particular framework. It also assumes that developers can pick any of them with the same amount of effort, which is probably a proper assumption for a benchmark that measures performance.
How much cost efficiency is there? I’ve worked at multiple of the top 100 websites in the U.S., and none of them had more than a dozen web servers. For any website, 90% of traffic is logged out and probably doesn’t even hit the web servers.
And web servers are generally the least expensive component of hosting costs.
These costs are more meaningfull for those who bootstrap their businesses and projects at their own expense. I have a personal anectdote where switching from a scripting web framework to a compiled one reduced the runtime size, and the deployable environment went from 800mb to 10mb, which allowed me to fit all my deployments into a free-tier monthly quota of AWS ECR.
Given I already knew both frameworks from fulltime jobs, it's a great net benefit.
It also means that whoever wants to use self-hosted open-source alternatives of user-tracking services (notes, calendars, emails, etc), they can do it cheaper or even for free.
The epoll code use is very small, few hundreds of LOC (but is at the very center of the framework since it runs the event loop), actually writing it from scratch was better to understand the mechanism, to optimize it and to adapt it for non blocking sql.
To see how much C++ can get closer to dynamic langagues typically used for web programming and at the same time make absolutely no compromise on performances.
How long have you been working on it?
8 months on lithium, but I already wrote another framework (silicon, which I rewrote from scratch because of slow compilation time).
What do you do professionally?
I worked for several company, doing different things including, real time image processing, neural nets for photography, web programming, robotics
What are your plans for it?
No big plans yet, just having fun for now, but I'd like to keep implementing the features if the users ask for it, improve the doc. In the more long term, building websockets would be nice too, and other services on top of it.
Given that the whole framework is in roughly 31kloc of C++ written mostly by you and this not your first framework. Can you talk more about your design choices, what the architectural patterns are and how metaprogramming plays a role [1] ?
Besides your experience from writing Silicon [2], what other frameworks do you like the design of? What C++ codebases do you like or find interesting from a design perspective?
I think I need to write a blog post to do a retrospective on what I learned writing iod, silicon, lithium (and vpp, a small image processing lib). There is so much to tell.
Just to be short, here is my phylosophy:
- don't think too much when you write it the first time, just do it quick. Do all the mistakes and ugly code first and then rething/rewrite things when it can actually simplify the codebase. Big refactors are much more productive and successful than big premature design thinking.
- Only expose the complexity that the user need. Simple things should stay simple even if we can do more complex stuff. Look for simplicity everywhere.
- Profile before optimizing, the bottleneck is never where you think it is.
For you question about metaprogramming, it is used yes. The core of the framework is build around a compile time key/value map that unroll json and get/post parameter de/serialization at compile time [1] so no dynamic dictionary involving costly malloc is needed.
For Rust vs c++:
- I never used rust but I like the language and the safety it provides.
- a mainstream package/dependency manager is missing in C++ so rust is clearly better on this point. This is a big problem for the C++ community where code is much less shared than in the javascript community for example.
- The other big problem I have with C++ is compile time, I heard that Rust is slow too but I never compared the two.
Sadly seastar has no implementation of this benchmark. I had a quick look at the website and it seams that seastar is much more low level (it does not provide a http parser) and more generic that lithium, which is focused on simplifying the writing of HTTP APIs.
Yes zero cost is subjective. It has a cost at least at compile time. But you can see metamap is a like loop unrolling for objects (if you think loop unrolling is zero cost then metamap too).
Re: obscurity: This has been true for a long time. I remember seeing these a long time ago and it wasn't until about 80% down the list that a "real" framework showed up.
"Lies, Damn Lies, and Benchmarks" is somewhat true in this like any benchmark, but as others have pointed out, important techniques gain exposure using this benchmark over the years that trickle into mainstream use.
It would be nice if a summary of what techniques were used by new "frameworks" to gain leadership. All too often discussions of this benchmark turn into pissing matches by language zealots.
Actix's turn at the top was very interesting for the, uh, discussion, it prompted in the rust community about unsafe, although it was stressful for the original author.
In javaland, Vert.X has been high on the list for quite a while, but as I understand it, it uses "arena" allocation of buffers to avoid GC and other tricks to get the JVM up to par with non-GC languages, but isn't really typical of JVM development.
I would also like to know if HTTP protocol "hacks" are used, or networking tricks like userland network stacks.
But alas, it's just a huge wall of github projects and not a lot of use to a run of the mill developer like me.
We plan to collect additional metadata from maintainers, including home page URL, source URL, description, and so on. Once that happens, we'll be able to provide the kinds of helpful links you want.
The frameworks are downloaded by dockerfiles provided by the framework maintainers, and since there is hundreds of frameworks, this is a bit tedious to manually get all URLs...
But anyway, I'm pretty sure all framework maintainers will rush to provide all the info Techempower need to add a link to their website..
From a cursory inspection, the asp.net implementation seemingly runs on Mono [0] in order to run on linux at all. That might definitely be a performance "gotcha".
I'm more interested in the fact that something called asp.net core is one of the five fastest, while asp.net not 'core' is the single slowest. What's the deal there? I dimly know that asp.net is a Microsoft thing, and remember it being used by a bunch of enterprise sites a decade or so ago...
ASP.NET Web Framework is the Windows only 18 yo Web Framework based on .NET Framework https://en.wikipedia.org/wiki/ASP.NET
It could run on Mono/Linux, but did so very slowly & buggy and possibly what the slowest benchmark was run on.
ASP.NET Core is the newer (complete rewrite, inc runtime), leaner cross-platform (Win,Linux,macOS) Web Framework that's built on top of the performance focused .NET Core runtime released in 2016 https://dotnet.microsoft.com/learn/aspnet/what-is-aspnet-cor...
They're only the same by name which is unfortunate, because it's completely new. It's the future runtime/platform for .NET and will be renamed to .NET 5 whilst the older .NET Framework will stay at v4.x (currently at v4.8).
> They're only the same by name which is unfortunate, because it's completely new.
I certainly wouldn't say that. Full Framework .Net and .Net Core are _far_ more similar than they are different - they share a very large common base class library API surface in Net Standard.
A huge percentage of .Net MVC 3-5 (full framework) applications could be migrated to .Net Core MVC in a matter of hours to days.
You don't even have to specifically target .Net Core in a lot of cases - v2 included a compatibility shim so that libraries targeting full framework just work as long as they don't call any of the API's that were in full framework but are not in the Core runtime.
The main differences are in startup configuration of your web app and how you go about implementing cross-cutting concerns that might inspect or intercept every request - but even for a lot of common things like authentication there are similar extension points to what existed previously.
There are some particular technologies - like WCF Server or Linq2Sql - that are hard blockers if you were relying on them - but by and large moving from older full framework .Net to .Net Core is not that difficult and certainly doesn't require learning much new, especially fundamentals - it's mostly just some details of the web app framework.
ASP.NET Core was a rewrite. None of the previous .NET Framework or HttpListener HTTP classes or Abstractions are available in .NET Core which were rewritten to use newer HTTP Abstractions from scratch.
> The main differences are in startup configuration
It's definitely not the main difference, it may be what's immediately visible as a dev using it, but the entire Web Framework your App depends on is new. It's not a fork of an existing code base, it's a completely different one with the goal to retain the same development MVC & Web API model to preserve knowledge reuse & ease porting efforts.
The compatibility layers are just that, retrofitted to preserve compatibility & ease porting which is able to work through impl-free Reference assemblies with APIs they both share. Of course none of ASP.NET Framework System.Web is in .NET Standard, because it was rewritten.
They rewrote most of their stack when they switched from .Net Framework to cross-platform .Net Core. There's a lot of wailing and gnashing of teeth in enterprises that invested heavily into deprecated technologies, but the new Core platform is very good for modern webdev.
ASP.net runs on IIS or HTTP.SYS and is hardly optimized. ASP.Net core has an own HTTP server (kestrel) and has had numerous optimizations , including to the .NET Core framework and C# (for instance Span<T>)
The naming is just confusing. dotnet core is a cross-platform open source rewrite of the .net run time and is very performant, the old asp.net framework runs on the windows old .net runtime that you are aware of.
I really do like these TechEmpower Fortunes benchmarks. It has quite an extensive list of frameworks, languages, and configurations with throughput, latency (w/ SD), and errors.
I always check any lesser known framework to see which ballpark it falls into. I'm always surprised to see that so many of the popular frameworks are ~10x worse than the best--although many of those 'best' don't do as much processing. I'm much more likely to pay attention to the error counts, max latency, or SD (σ).
I wonder how realistic/fair the code of some benchmarks is.
I just took some small peak at atix (because I happen to know it) and while first all looked fine it wasn't quite that realistic.
Mainly:
- It uses a fork of tokio-postgres specific for the test (which differs in that Client is no longer send and it has a Unsafely soundness hole by wrongly using Unsafe Cell, and no Issue tracking enabled on, through replacing that cell with RefCell probably yield very similar performance)
- Instead of using the default web::Json responder it uses simd-json (EDIT: explicitly encoding the data into a buffer instead of returning it wrapped in web::Json)
Just to be clear besides the first point all of this is not unrealistic for a context where you want to highly optimize your server. But not how actix is used most of the time.
---
Edit: Also just to be clear I didn't nit pick intentional on actix, it just happen to be the framework I'm more familiar with and especially using snmalloc for actix seems to be quite a reasonable idea.
EDIT2: The Unsefety unsoundness isn't triggered given the way this library is used, it's still there and would prevent this change from ever been merged upstream.
I know this is a popular line of thought, but as someone who really measures things, it doesn't match my experience when it comes to web server frameworks.
In web applications, apart from glaring mistakes such as N+1 queries, pretty much every bottleneck I had in production was due to slow serialisation, slow database abstraction or excessive CPU/memory usage due to language/framework. Also had issues caused by initialisation time. Saving 200ms or 300ms on each request can give you better user experience.
For growing startups, having a fast web framework gives you the choice to postpone doing complex horizontal-scaling for months, or even years. This saves money on the sort term and allows you to keep growing their products without having to worry about other things. The database can scale vertically in the meantime.
As for large tech companies: at my previous workplace there were two migrations in the span of two years because of performance issues. Hundreds of developers collectively agreed on changing the tech twice because of framework bottlenecks.
Of course you shouldn't go straight into a C++ framework when Rails becomes too slow for you, but people should know the tradeoffs of the framework they're using.
Developer velocity matters way more than the framework as long as the framework allows for horizontal scaling. You don't have to use it, but it should be there and it should be easy to use. Depending on your goals, one should be writing a system that horizontally scales after the first major prototype. Then push the horizontal scaling tests into your CI/CD pipeline and dev locally as if your cluster scaled down to one node.
Nice of you to mention developer velocity: it is extremely important, and it's also severely impacted by things that depend on the framework having good performance, such as initialisation time, development-time performance (it's important to have a quick feedback loop), time required to run individual tests (important for TDD), time required to run the whole test suite (running it often saves developers time during refactoring), time required to compile assets, time for the CI/CD to run (important to allow quick and safe bug fixes and a fast process), etc.
Once your app stops being trivial, those things start adding up and people start talking about migrating to something else, breaking the app into microservices, or developing in-house tooling. All those things cost money. It might take a few months but it happens. And I've seen over and over.
Also, having horizontal scaling is definitely not easy for non-trivial apps. It costs money to the company, might require extra employees, and might put a hold on new features. And horizontal scaling doesn't help you with performance time or developer velocity.
Also please keep in mind that performance and developer velocity are not at odds. Most of today's "fast" frameworks are quite good when it comes to developer velocity: ASP.NET Core for instance is #6 on the Techempower framework and is incredibly productive.
Is this one of those RomComs where we are screaming at each other in agreement?
Horizontal scaling of course isn't binary and using a remote state store, preferably with CRDTs or strong consistency will allow _most_ applications to remove a spof and single node perf from dominating.
Everything you said is spot on, except for your dismissal that horizontal scaling should be first class. It might have been hard before, but we no longer have an excuse. Store your state in a modern distributed store.
> Developer velocity matters way more than the framework as long as the framework allows for horizontal scaling
I disagree with that. Both because developer velocity is not unrelated to framework performance, and because horizontal scaling is not a silver bullet that will magically solve the performance issues related to framework code.
You can take it into consideration when starting a new project. I wouldn’t use a C++ framework because of security reasons and Rust if I don’t have access to enough rust programmers, but ASP.NET Core seems pretty reasonable.
That depends exactly on what that sentence means. It is fairly unusual that you need to parse a JSON with unknown structure during any kind of speed-critical section.
My personal criticism is that C++ is usually (not always, though) the wrong language for implementing web-facing applications or APIs. There is a considerable advantage in terms of speed (when using the right framework) but there are so many things that can subtly go wrong in terms of safety and security that you need to be very, very careful when picking C++.
It's fairly unusual that today I don't know the structure of the JSON I'm parsing. It's not at all uncommon to want the server to do the right thing if I add keys to the object which it didn't previously care about, though.
Offtopic: matt42 (or others); where do I learn modern optimization like this? I can optimize for cycles on embedded MCU's and older computers and I can optimize in .NET/JVM (and others) but are there any good sources for OS on modern metal optimizations? Besides reading the Lithium sourcecode ofcourse.
I would say the profiler output is the best source of information. Lithium is not using black magic at all. Actually on large framework like lithium, performance problems come more from obvious errors that get hiden in the quantity of code than in code that you did not crazy optimize.
Only the profiler will tell you if there is a hidden malloc that is slowing down all the rest, or a inefficient datastructure, or ...
There is probably lot of non optimized code in lithium, it just has no impact (or if it has I still has to find it).
That said, for other kind of software (usually more CPU bound), going down to assembly or using SIMD can be the only way to get good perfs.
People probably flagged it cause people hate benchmarks / dont trust them. I think they at least give us some indication about things on the other hand.
This is the official continuous techempower benchmark, run by techempower on the exact same environment than the techempower "official rounds". The only diff I'm aware of is that there is more manual review and check for rounds than for continuous benchmarks. Here is more info about it here:
We have already seen tremendous social adoption of the continuous benchmarking results. For selfish reasons, we want to continue creating and posting official rounds such as today's Round 16 periodically. (Mostly so that we can use the opportunity to write a blog entry and generate hype!) We ask that you humor us and treat official rounds as the super interesting and meaningful events that they are.
Jokes aside, the continuous results are intended for contributors to the project. The official rounds are less-frequent snapshots suitable for everyone else who may find the data interesting.
has there ever been a code created just specifically for this benchmark, in assembly language with everything hard-coded, just to see what the upper bound is ?
Some framework are using raw sql requests to skip the overhead of the ORM but they are still slower (lithium's ORM has no cost at runtime anyway). Coding everything in ASM would be insane and increadibly hard to ensure that this is the optimal version.
Those who work on these frameworks, and those interested in http framework benchmarking.
Some of the ideas developed in these frameworks percolate down to the PLs that use them and to the libraries they use, which benefits everyone. These frameworks also test new language features.
It’s on the front page because enough of us are interested. If you aren’t interested, just read one of the other articles on the front page, and leave us to discuss this
The benchmark is clearly not perfect. But it is great for the community: It pushes all the framework maintainers to optimize their code, so kind of all users of all frameworks are indirectly positively impacted.
We execute runs like this continuously to allow maintainers of test implementations to observe the results of their contributions. Given the performance seen here, it is very likely that Lithium will be well ranked in the next official round. But we feel the ranking seen in a continuous run such as this should be taken with a grain of salt until that next official round is available.
[1] https://tfb-status.techempower.com/results/57b25c85-082a-401...