Web Framework Benchmarks Round 9

idlewan · on May 1, 2014

Disclaimer: I'm the author of Nawak, so I'm pretty excited about this.

The Nimrod programming language is finally featured for the db tests (at least on i7. Not sure what happened with ec2 and peak, as neither jester nor nawak seem to appears in the results).

It fares pretty well when the database is involved.

Look for the nawak micro-framework, in the top 10 both for fortunes:

http://www.techempower.com/benchmarks/#section=data-r9&hw=i7...

and updates:

http://www.techempower.com/benchmarks/#section=data-r9&hw=i7... )

And there is room to grow. That micro-framework will not be the best for the json or plaintext tests, but once the database needs to be involved, it is trivial to add more concurrency: firing up more workers (1-3meg each in RAM) acts as an effective database connection pool (1 database connection per worker).

edit: Why should you care? Nawak (https://github.com/idlewan/nawak) is a clean micro-framework:

  import nawak_mongrel, strutils

  get "/":
      return response("Hello World!")

  get "/user/@username/?":
      return response("Hello $1!" % url_params.username)
  
  run()

Benchmark implementation in 100 lines here: https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast...

dom96 · on May 1, 2014

(Author of Jester here)

It's nice to see Nimrod making it high up on the list of the fortunes benchmark. I still have not had the time to properly optimise my framework, or to even write code for the other tests, but I am currently working on a new async module which will be a lot more competitive (an experimental version of it already made it into the latest Nimrod release, see http://nimrod-lang.org/news.html). In any case it's nice to see that others are creating their own Nimrod frameworks to compete with mine (thanks idlewan :) ).

camus2 · on May 1, 2014

Nimrod looks interesting,how does it compare to Go, as i see it works without a VM, like Go, and has async programming features.

bhauer · on May 1, 2014

We're very happy to have Round 9 completed. The 10-gigabit 40-HT core hardware has been quite amusing to work with. In most respects, these servers utterly humble our in-house workstations that we've been using for previous rounds.

If anyone has any questions about this round or thoughts for future rounds, please let me know!

nostrademons · on May 1, 2014

I'm curious, would you be interested in microbenchmarks for things like concurrency models? I'm thinking about preforking vs. thread-per-request vs. events for C++, NIO vs InputStreams vs. Fibers for Java, etc. The idea would be to measure how much overhead is coming from the OS vs. the language vs. the framework so that framework authors could make informed choices for their framework architecture. Or would that be out of scope?

bhauer · on May 1, 2014

We are comfortable accepting and including what we call permutations.

Clearly, our project is about testing web application frameworks and platforms (we just say "frameworks" for conservation of typing). Testing NIO vs. InputStreams, for example, in isolation is a different project.

But as part of a web framework—let's say you had a framework that provided support for both NIO and InputStreams—we'd be happy to include the multiple permutations.

Obviously, this is within reason. If you submitted 512 permutations, I might have a small heart attack. Also, a general principal that we ask all participants to bear in mind is that each test should represent a viable production-grade configuration. This guideline might rule out a great number of wilder permutations.

eggbrain · on May 1, 2014

These web framework tests have been really interesting to look at, and each time I've been saddened to see that Rails/Ruby, the framework/language I program with most days, is consistently near the bottom. With Adequate Record now being merged into master, I'm hoping we start climbing the speed test records.

But a question that keeps coming up in my mind is that there are metrics that would be much harder to compare, but might be more useful in my book.

For example, I'd love to see a "framework olympics" where different developers build an agreed upon application/website on an agreed upon server using their favorite framework. The application has to be of some decent complexity, and using tools that an average developer using the framework might use.

In the end, you could compare the complexity of the code, the average page response time, maintainability/flexibility, and the time it took to actually develop the app and the results could let developers know what they sacrifice or what they gain by using one framework over the other. I know a lot of these metrics could reflect the developer themselves vs the actual framework, but it might also be a tool to let you know what an average developer, given a weekend, might be able to produce. It would also help me to see an application written a ton of different ways -- so I can make good decisions about what framework to choose based on my needs.

In the end, speed only tells us so much -- and speed is not the only metric that we consider when we write applications -- otherwise it looks like most developers would be coding their web apps in Gemini.

bhauer · on May 1, 2014

Readers should definitely evaluate the complexity of the code necessary to implement the tests. To that end, I plan to eventually enhance the results web site to allow you to readily view the related source for each test implementation (perhaps using an iframe pointing at GitHub).

For the very first round, we included the relevant code snippets in the blog entry, and I think that added a lot to the context. With nearly 100 frameworks, the volume of code has become too large to simply embed all of that code directly into the blog entry, but we've put too much burden on the reader to sift through the GitHub repo to compare code complexity.

Things we aim to do:

* Pop-up iframe with relevant code from GitHub.

* Use a source lines of code (sloc) counter to render sloc alongside each result in the charts.

* Render the number of GitHub commits each test implementation has seen at our repository. At the very least, this would show whether a test has seen a lot of review.

* Introduce more complex test types [1].

And as the other reply has mentioned, we have also discussed the possibility of a larger test type that might include a multi-step process. I'd love to eventually get to that point.

[1] https://github.com/TechEmpower/FrameworkBenchmarks/issues/13...

programmer431 · on May 4, 2014

a matrix rank calculator might be helpful as well to filters results for users, though I am impressed with the present and new(ish) filtering available.

programminggeek · on May 1, 2014

I actually have been working on a super ridiculous ruby benchmark for every ruby framework and server I could find. It should be out in the next month. It's been quite the undertaking.

camus2 · on May 1, 2014

Rails slowness just mean one thing, you'll have to spin more server instances for the same task that would have been faster in Java and would have needed less servers (though java eats a LOT of memory, so raw speed is not everything ).So it's a trade off. Spend more money on devs or on the infrastructure?

But dont worry, it's still way faster than most PHP frameworks, that have ridiculous performances,yet their core developpers dont seem to care.

I think it really depends on the kind of app one is building. Video Site like Youtube can use caching to the max,most of the hits wont touch any server-side code,only cached pages. on the other hand a webapp that actually does something and need realtime capabilities might not be the right use case for Rails(like Twitter,though it helped them build their MVP quite fast,same for Iron.io).

It's a tradeoff, do you want to develop fast,at the cost of raw perfs,or get good perfs from the beginning without scalability issues at first place?

kbenson · on May 1, 2014

This comes up every round. It's also a good idea.

Make it happen. :)

eggbrain · on May 1, 2014

I might just do that! At the very least, I'd be willing to represent the "average" Rails developer.

Getting an agreed upon application might be tough, but I'll try setting some stuff up to make it happen, as long as you agree you'll be a part of it :)

kbenson · on May 1, 2014

Sure, I'll contribute some Perl. :)

nemothekid · on May 1, 2014

Interesting to see Go benchmarks fall right out of the top 10 on the 10GbE machines while Java/JVM and even Javascript do amazingly well. My guess at 10GbE you are now testing how much the framework spends on the CPU.

JulienSchmidt · on May 1, 2014

My guess is that Go, especially the database/sql package, uses too expensive synchronization.

Most of the pool management code uses mutual exclusion which can get really expensive when 40 threads compete for it.

iends · on May 1, 2014

It would be nice if the golang community investigates this further. For a language with high concurrency as a feature it's disappointing that it drops off so much.

lazyjones · on May 1, 2014

> My guess at 10GbE you are now testing how much the framework spends on the CPU.

Not really, it's more a benchmark of the runtime and its networking code probably. It would also be interesting to see a trace / summary of the syscalls for benchmarking runs.

chrismoos · on May 1, 2014

It probably has something to do with NUMA. Java 7 has a NUMA supported allocator that makes it more likely for threads to access memory from their local node. I'd be interested to see the golang test running with multiple processes, with each one running on an individual NUMA node.

sandGorgon · on May 1, 2014

there's something funny - for the multiply queries benchmark [1], Go is performing worse than codeigniter, kohana. But cpoll_cppsp-postgres is performing worse than codeigniter-raw .

anybody know what's going on .. is a full PHP framework performing better than raw c++ under certain conditions?

[1] http://www.techempower.com/benchmarks/#section=data-r9&hw=pe...

Loic · on May 1, 2014

PHP is for these tests, nginx and 256 to 512 PHP processes under PHP-FPM. Not a single process. If you put 40 Go processes behind nginx, each one allowed to use only 2 to 4 processors, you will get notably different results.

Anyway, the tests show how hard it is to scale to many processors with a single process. It shows that the Java VM is really good at it (to be expected, Sun always had systems with insane numbers of cores).

pkroll · on May 1, 2014

Sure: Raw C++ isn't necessarily OPTIMAL C++, and PHP is really, really optimized for the read-multiple-queries-render-to-a-page case (at least for MySQL it is).

agnsaft · on May 1, 2014

Anyone know why this is? This was quite disappointing...

RyanZAG · on May 1, 2014

Couple notes:

They have a blog post about the results here: http://www.techempower.com/blog/2014/05/01/framework-benchma...

If you're running on EC2 and not dedicated hardware (probably most people reading this), be sure to toggle to the EC2 results at the top right of the benchmark.

nobullet · on May 1, 2014

It is great to see my beloved Ninja framework (fullstack, Java) in standalone mode to be the one of the best performers in multiple queries benchmark (better than 81% of 93 known frameworks) and data updates (better than 77% of 93 known frameworks).

These are the most realistic scenarios for the web app in my opinion.

ksec · on May 1, 2014

And Ruby on Rails is...... at the bottom of the chart again.

It is quite disheartening to see it being 20 - 50x slower on the best. Or 2 - 5x Slower to other similar framework.

manishsharan · on May 1, 2014

How is this data useful to someone building a web application? I have used several of these frameworks ,alteast the jvm based ones, and I can tell you that it is like comparing apples to oranges. Case in point, Wicket, which I have been using for several years, is a component oriented framework with a rich ready to use pre-built components. If on the other hand you are using netty, you are left to reinvent pretty much everything . Based on your configuration, it may be that wicket is returning the response from cache. Compojure and wicket serve different business use cases.

kbenson · on May 1, 2014

Like any set of benchmarks, using the raw number without taking into account externalities such as those you listed would be a poor indicator of the best framework for your needs.

That said, if you need serving power, the fact that some some solutions can literally serve 100x more requests than others, and for likely less than 100x slowdown in development time and effort, may matter.

scanr · on May 2, 2014

It really depends on what you're optimising for. If you're aiming for being able to build features quickly / elegantly using higher level abstractions, these benchmarks may be less interesting.

If on the other hand, you're building an application where performance is key, it's useful to know how much overhead your framework is adding. We have a real time bidding application which falls into this category, so these benchmarks quite interesting to me.

ldrndll · on May 1, 2014

I was hoping that Snap & Yesod would be run on GHC 7.8. It'll be nice to see what sort of improvements MIO will make, especially on the 40 core machine.

kbenson · on May 1, 2014

That's an interesting point. It would add quite a bit of extra entries to the code, but it would be interesting to see all languages run with the latest and prior major language versions. That quickly gets out of control though. Especially when you consider some languages implemented on others (languages that sit on top of the JVM, for example).

egonschiele · on May 1, 2014

Ah, I was wondering why they weren't higher up. I hope MIO makes it into the next round.

cgag · on May 1, 2014

Still seems strange to me that they're that low.

vdaniuk · on May 1, 2014

Yeah, I am currently learning Haskell and was surprised by the apparent low performance of Yesod and Snap in comparison to the faster frameworks.

I am wondering if I should spend more time on Clojure instead, as it seems to be significantly higher in the ratings.

cgag · on May 1, 2014

The next version of GHC should improve those numbers significantly, but I would also note that there are much more significant reasons to consider using Haskell than the difference between 37k and 43k req/s.

mark_l_watson · on May 1, 2014

Yesod was only slightly slower than Clojure with Compojure. Also, an older version of Haskell was used.

I was very surprised that Compojure did not score higher in the rankings. I have several deployed apps with Compojure and it seems very performant.

jsnk · on May 1, 2014

I am surprised that NodeJS on Mysql is much faster than on MongoDB. Is this expected?

http://www.techempower.com/benchmarks/#section=data-r9&hw=pe...

cwufbt08 · on May 1, 2014

There are many frameworks showing as "did not complete". I was interested to see the results for Spray since it did really well in previous rounds, but there are no results for Spray in the latest round.

bhauer · on May 1, 2014

"Did not complete" indicates that a framework's test implementation does not pass validation. Spray is somewhat exceptional because the Spray community asked for it to be pulled until they have the time necessary to re-implement its test case. See the PR below:

https://github.com/TechEmpower/FrameworkBenchmarks/pull/842

rschmitty · on May 1, 2014

Could you make the error log accessible? Some of the frameworks appear to just bleed errors left and right. Be interesting to see if they are real errors or just misconfiguration

bhauer · on May 1, 2014

We still need to post the final logs, but you can find logs from the preview of Round 9 at the following GitHub repository:

https://github.com/TechEmpower/TFB-Round-9/tree/master/peak/...

bhauer · on May 1, 2014

I can't edit the above message. The final logs are posted now:

https://github.com/TechEmpower/TFB-Round-9/tree/master/peak/...

rschmitty · on May 4, 2014

thank you!

faizshah · on May 1, 2014

Does anyone know why Ur/Web has far better performance on EC2 than other platforms?

Also can anyone share their experience using cpoll_cppsp?

megaman821 · on May 1, 2014

I would love to so Varnish in here for some of the tests.

For a typically webpage with multiple queries there appears to be around 5-10x performance disadvantage between slow and fast languages.Things like serving a plaintext or json response, where the slow languages are much much slower, Varnish is a good match for.

jerf · on May 1, 2014

I would be intrigued just in general to compare against nginx or something serving a static file of the same size. Obviously that's "cheating" but in some sense it's a better benchmark for me to judge against than "what happened to be the fastest framework this time".

bhauer · on May 1, 2014

We are focused on testing without reverse proxy caches such as Varnish and the caches provided by nginx and Apache HTTPD. This is because we want to measure the performance of dynamic requests—requests that for whatever reason are not cached.

That said, we have considered including a measure of, say, nginx delivering the same content statically as a high-water mark. That could be interesting to see how well a framework compares to what might be an ideal case.

jerf · on May 1, 2014

I mean your second paragraph, and for that reason.

sergiotapia · on May 1, 2014

I may have overlooked it but I cannot find ASP.Net MVC5 there. Is that framework version being tested?

epaladin · on May 1, 2014

I think they were trying to run it under Mono, and it failed to complete most of the tests. Setting up IIS for a "real" test of ASP.NET MVC might be out of their intended scope.

bhauer · on May 1, 2014

We do have Windows tests (run exclusively on the Peak environment for Round 9), but most of the ASP.NET tests are not starting and passing validation correctly as-is. We'd like to get these fixed up along with all of the other tests that are not passing validation and running correctly. As you might expect, we have a never-ending game of whack-a-mole with so many frameworks in play, so we lean heavily on the community to assist. We'd love to get pull requests to help validate the existing .NET tests.

patja · on May 1, 2014

That's a shame. How hard can it be? Out of the box defaults should suffice and make for a realistic/reasonable test environment.

edgarvaldes · on May 1, 2014

It's disappointing to see Laravel at the bottom, showing 47,135 errors and a 51% succes rate.

JonoBB · on May 1, 2014

Given that the Laravel tests completed fine in all previous benchmarks, I'm guessing that there is something else amiss here.

mediascreen · on May 1, 2014

Also, the tested Laravel version is 3.2, while the current version is 4.1

itry · on May 1, 2014

Nice to see how great PHP is doing. It is still my favorite language.

One thing is strange: The HHVM result on the "plaintext" test. How can HHVM only do 938 requests/sec if it can do 70,471 in the much more complicated "singly query" test?

bhauer · on May 1, 2014

Yes, something is presumably wrong with the hhvm plaintext test. Some test implementations enter a failure state after the warm-up and never recover. That type of failure is distinct from the "did not complete" failures where the validation checks do not pass.

bigdubs · on May 1, 2014

Still scratching my head at the C#/httplistener results. Ostensibly it should be pretty close to what a native C++ implementation should look like performance-wise, as a good chuck of the work on the raw text results is done by http.sys.

RyanZAG · on May 2, 2014

Just because it's written in native c++ does not mean all native c++ code will perform the same. Just look at the difference between the top java framework and bottom java framework - huge difference, even though it is the same language. Same likely applies to the difference between http.sys and something like cpoll

bigdubs · on May 2, 2014

So normally, I agree that there is a difference between projects in the same language.

However, http.sys is kernel mode in windows, and should have advantages that other programs don't have.

necrobrit · on May 1, 2014

Interesting how the various Scala frameworks have been slipping down the charts from round to round -- an effect of changing methodology?

cwufbt08 · on May 1, 2014

Spray was one of the leading contenders in previous rounds, but didn't complete in the latest round. I wonder why that is.

krg · on May 2, 2014

The Spray folks requested to have it excluded from this round: https://github.com/TechEmpower/FrameworkBenchmarks/pull/842

We're looking forward to having it back in Round 10, though!

kclay · on May 1, 2014

May be because Spray was merged into the playframework,also I think it plays a bit in the akka http layer as well, not to sure.

saryant · on May 1, 2014

Spray has not been merged into Play, it's just a Typesafe project and will eventually be merged into Akka.

rcarmo · on May 1, 2014

Hmmm. No Python-gevent? Still? Good showing of http-kit, though. Love that lib.

rschmitty · on May 1, 2014

It's open source benchmarking. There are instructions on how to submit a language-framework of your choosing.

https://github.com/TechEmpower/FrameworkBenchmarks

joeblau · on May 1, 2014

Wow HHVM on top for Data Updates! Congrats to Facebook!