The Great Ruby Shootout (July 2010)

headius · on July 19, 2010

I can comment on the JRuby results.

For the base perf numbers, I'm not surprised. We've known that we're roughly on par with 1.9.2 for some time, and many of the benchmarks in question have started to reach a point of irreducible complexity (e.g. you can only slice and dice strings so fast). It's good to see JRuby remains at or near the front of the pack as far as performance, especially considering we've made no major performance-related changes in almost two years.

The Linux versus Windows numbers are a bit surprising to me. If I were to make a guess, I'd guess that the JVMs used were not identical (perhaps like Isaac Gouy mentions, one platform was 64-bit and the other wasn't) or some other detail altered the performance characteristics of the test. But the performance drop does seem to be in line with other implementations, so perhaps Windows really does suck and there's not much we can do about it.

On the memory issue, I have a few recommendations.

JRuby by default allows the JVM to use up to a 512MB heap (the default is usually 32-64MB, which is rarely enough for most nontrivial apps). The JVM likes to use as much memory as you're willing to give it, to keep GC times low (nearly free) and to give it lots of room to breathe. Almost all these benchmarks could run in far less memory (maybe 1/5 as much or lower) if the JVM were choked down to that level. So it's not surprising to me that the memory sizes for these very object and CPU-intensive benchmarks start to approach that 512MB limit; the JVM is just stretching its legs.

Expect to see a lot more performance work coming in JRuby 1.6. I've blogged about it here: http://blog.headius.com/2010/05/kicking-jruby-performance-up...

Also expect to see more work on picking a "winner" as far as lightweight servers go. Something that works as seamlessly as Passenger could be the "last mile" we need to get folks to make a move.

And watch our two Ruby Summer of Code projects: Ruboto, bringing JRuby to Android; and C extension support.

We're working very hard to bring JRuby to everyone and everyone to JRuby. The reasons not to use JRuby are rapidly disappearing.

acangiano · on July 19, 2010

> The Linux versus Windows numbers are a bit surprising to me. If I were to make a guess, I'd guess that the JVMs used were not identical (perhaps like Isaac Gouy mentions, one platform was 64-bit and the other wasn't) or some other detail altered the performance characteristics of the test.

The same identical version was used: Java HotSpot(TM) 64-Bit Server VM 1.6.0_20.

> But the performance drop does seem to be in line with other implementations, so perhaps Windows really does suck and there's not much we can do about it.

The leading theory. ;-)

anaisbetts · on July 20, 2010

I don't know that this theory totally holds water (Windows OS developer here, announcing obvious bias) - looking at the results, the one place Windows really gets nailed is the I/O test; I suspect there's some significant optimizations that could be done there, as I/O on Windows OS in general certainly isn't 4x slower than Linux.

headius · on July 20, 2010

If there's something we or the JVM could do to improve these numbers, I would love to talk to you about it. At this point, if the JVM developers haven't found the magic sauce, we JRuby guys probably won't either...but I'd really love for JRuby performance on Windows to match JRuby performance on Linux.

anaisbetts · on July 20, 2010

I don't think that I can actively hack on the JVM, but I can tell you that XPerf is a great way to determine where you're spending your time. http://www.microsoftpdc.com/2009/CL16 is a really good intro video on the topic, it's a very powerful tool (though doesn't provide as much analysis as Instruments for example)

donw · on July 20, 2010

Tooting my own metaphorical horn a bit, but it was relatively easy to put together a Jetty-Rack interface for high-performance webapps. It's called Mizuno, and lives at http://github.com/matadon/mizuno

Internally it uses Jetty's event-driven I/O, so performance is on a par with Thin and Passenger, at least on my MacBook.

I'm a bit pressed for time at the moment, so if anybody wants to add in the cometd servlet (it's in the repos) and write a tutorial, that'd be awesome, and if not, I'll get to it in a few weeks.

headius · on July 20, 2010

Sounds pretty cool :) I'd love to see some blog posts about it, but I doubt I'll have time to do so in the next couple weeks.

necubi · on July 20, 2010

I'm very excited to see C-extension support. That might finally convince me to port some of my projects.

rbranson · on July 19, 2010

JRuby would be the future if the startup times weren't so horrendous. It makes it very unpalatable for scripting. Nailgun is not robust enough. I have a feeling this is going to be more of a JVM problem than something Nutter and team is going to tackle. So is the future Rubinius and JRuby?

jon_dahl · on July 19, 2010

For scripting, that's definitely a problem. But it's not a big deal for long-running processes (e.g. a web application server). But Ruby definitely benefits from both: a fast Ruby with ~zero startup time, and a fast Ruby on the JVM.

3 years ago, Ruby implementation problems became the hot topic at Ruby conferences. Ruby was a beautiful language with an ugly implementation. It's great to see that the community has executed on this concern.

rbranson · on July 19, 2010

While I agree that it's not important for long-running applications, there is still a lot of scripting done in Ruby, and it's very frustrating to get a a 5-to-10 second wait just to tell someone their arguments were invalid. Waiting that long for tests is also such a drag on iteration.

I think once JRuby has stabilized the native gem support and has some sort of Passenger-like deployment option (Glassfish and Warbler isn't quite there yet), we'll see some big players built on MRI start to move their Web applications to the platform.

rue · on July 19, 2010

Use MRI for short-lived and JRuby for long-lived. Problem solved?

headius · on July 19, 2010

A large part of startup time is out of our hands, but we continue to look for workarounds.

Most current startup slowness is due to the JVM itself running slow during the first few seconds of execution. Ruby scripts execute as they boot, which means we have to parse and run them. But until the JVM's been up for a little while, nothing in JRuby itself has even JITted to native code...and interpreted JVM bytecode runs even slower than Ruby on most of my tests.

The other complicating factor is the fact that Ruby applications run so much code on boot. RubyGems, for example, degrades startup time O(n) based on how many gems you have installed. Hacks like faster_rubygems (gem install faster_rubygems) help, but changes are needed in RubyGems proper.

In any case, we feel the startup pain too. I've blogged a few tips about startup time here: http://blog.headius.com/2010/03/jruby-startup-time-tips.html, and I've described some of the challenges here: http://blog.headius.com/2010/06/my-short-list-of-key-missing...

We're working on it...we really are. We're just fighting against a decade of JVM folks who never run command-line tools.

donw · on July 19, 2010

The tradeoff is that you get a much more stable runtime, true multithreading, and the ability to play ball with everything in the Java ecosystem.

Sure, it takes longer to start up, but we're talking a few seconds here, which is nothing for a long-running webapp. Plus, the JVM is rock-solid, has very predictable memory usage, and also has tons of monitoring infrastructure to bring to bear against problems.

Yeah, I'm a fan.

cageface · on July 19, 2010

I'm much, much more concerned about the memory usage of JRuby than the startup times. For toss-off scripts I can keep using MRI.

I've read that JRuby memory usage scales better than MRI though, so you pay a higher cost upfront but can actually accommodate more workers on the same hardware. I haven't gotten far enough in my exploration of JRuby to see if this is the case in my own apps.

Hats off to all the devs working on bringing us a better Ruby runtime. I realize it's a lot of work.

rbranson · on July 19, 2010

In my tests, it's still worse than MRI. Memory-wise, the best between 1.8.7, 1.9.1, and JRuby 1.5 was still 1.8.7. Of course, if this is the difference between storing stuff in memory and loading from disk, 1.8.7 would be faster by orders of magnitude. The ability to use stable, high-performance JVM libraries trumps this concern in many cases though.

headius · on July 19, 2010

It's also worth pointing out that while a JRuby/Rails instance might take 100-200MB, that's all you need to scale a site across pretty much any number of cores. MRI and REE both need to spin up multiple processes to handle concurrent requests, so very quickly the JRuby memory size becomes a tremendous win (think of 25-50 MRI instances using 20-50MB of memory each...you get the picture).

cageface · on July 19, 2010

Yeah that's what I'm hoping to see as I transition a couple of test apps over. The scala/lift folks see this as a big advantage for their JVM web stack too.

rbranson · on July 19, 2010

Yeah I see that. My tests primarily revolved around single-threaded event driven code, and retaining large container objects (millions of objects). MRI was much more efficient. I understand that there's a lot more going on under the hood in JRuby just to make this possible, but for this use case, the result is what mattered. Although we ended up using Tokyo Tyrant (with a hack to pre-disk-cache all the data) anyway.

headius · on July 20, 2010

Object sizes in JRuby are certainly larger, especially if you run on a 64-bit JVM which necessarily has 64-bit reference fields for all object references (basically everything in Ruby). It doesn't surprise me at all to see a single-threaded case with a lot of objects use more memory, but it would be interesting to see how much of that was unused heap space and how much was actually live data. MRI's less-efficient conservative collector can live in a smaller memory space, but you sacrifice performance.

Confusion · on July 19, 2010

It's definitely a JVM problem, but contrary to Java, there is a sort of REPL available. Instead if writing a file, modifying it and running 'jruby my-script.rb', just issue 'load myscript.rb' in your (j)irb. That makes for much faster iterations.

isaacforce · on July 19, 2010

I have both MRI and JRuby installed on my systems. For quickie scripts I use 'ruby' and for anything with a noticeable runtime (Or multi-core processing!) I use 'jruby'. I'm sure the JRuby fellows will continue working on startup time, but for me at least, it's not an issue.

I think having a Ruby toolbox instead of a single big hammer is a win for everybody.

headius · on July 19, 2010

We hear this a lot. Hopefully we can keep improving startup times, but when it's possible to use either Ruby or JRuby, it makes a pretty good combination.

aarongough · on July 19, 2010

It's worth noting that startup times with 1.9 are longer than with 1.8 (in my experience anyway). Startup times for my 1.8 instances on Heroku are sub-second. Whereas instances running 1.9 are 6+ seconds!

Disclaimer: I haven't dug into this issue much, so its possible other factors are the cause...

dminor · on July 19, 2010

It will be interesting to see what sort of difference Java 7 makes for JRuby.

alnayyir · on July 19, 2010

Java != JVM

jimbokun · on July 19, 2010

But invokedynamic is part of Java 7.

http://java.sun.com/developer/technicalArticles/DynTypeLang/

headius · on July 19, 2010

InvokeDynamic and other Java 7 features (method handles, NIO2) are definitely going to improve JRuby's situation, but maybe not in the way you expect. Indy and method handles will largely allow us to delete (or not load) code we currently generate to get the same effect. Smaller runtime, possibly a smaller distribution. NIO2 will give us more direct access to streams, process handling, and so on, so we can delete hacks we've written to do all that ourselves.

But perhaps the most interesting aspect of any new major Java release is that they're usually 10-20% faster across the board, due to new and better optimizations. As the saying goes, if your Java app isn't fast enough, upgrade to a newer JVM.

I think invokedynamic combined with more runtime profile-driven optimizations in JRuby could easily double JRuby's Ruby-execution performance (or better), and in many cases reduce memory churn too. Lots of good things coming...it's nice to have an army of VM engineers on your side.

MrRage · on July 19, 2010

Isn't JRuby implemented in Java?

Roboprog · on July 19, 2010

Why no IronRuby test on Windows? Understand, I'm no fan of M$, but it seems very odd to run the IronRuby test on Mono, then omit running IronRuby on the "native" .NET environment.

acangiano · on July 19, 2010

IronRuby was tested on Windows a few weeks ago: http://programmingzen.com/2010/06/28/the-great-ruby-shootout...

leif · on July 20, 2010

Means and (especially) medians mean little to nothing when you're doing them over entirely different benchmarks. Why bother? A 3D plot would be a lot clearer.

acangiano · on July 20, 2010

> Means and (especially) medians mean little to nothing

It's the other way around. Given the skewed nature of the data, the median is a better measure of the central tendency of the data set.

But note that I'm not plotting the median only. I use a box plot which gives you a much better statistical picture than using a simple median (or mean). Nevertheless, summarizing different benchmarks has its limitations and box plots give you a rough, general idea of what's going on.

leif · on July 21, 2010

I disagree. If you write ten benchmarks, and they finish in the same order on all platforms, the median really doesn't give you any information apart from how the middle one did. At least the mean will be changed if the slowest benchmark for some reason is a lot slower on one platform. In this case, there was some variation, but for the most part it looks like the median is just one of 4 or 5 middling benchmarks in all cases, so you're really just removing data, if anything.

The box plot is a good choice here. What I would like to see is a graph plotting time against {VM} x {benchmark size} to see how each platform scales within a benchmark.

With the data you've got, apart from confusing me a little by including the mean and median, you did a good job presenting it. I'd love to see why some platforms do so much well than others in some cases (except JRuby which everyone knows is a memory hog), but this seems way out of scope for this article. :)

bodhi · on July 20, 2010

Did you normalise the results before plotting them and calculating means and medians? Or is it a summary of the raw data? I could probably find out myself, but since you are posting here, I thought I'd ask.

necubi · on July 19, 2010

Too bad there's no MacRuby, which is probably the fastest ruby interpreter (though it only runs on OS X).

acangiano · on July 19, 2010

It's right there in the first paragraph of the article: http://programmingzen.com/2010/05/16/benchmarking-macruby-0-...

aarongough · on July 19, 2010

And it looks like it's slower that either 1.8 or 1.9...

messel · on July 19, 2010

The curve I dream of seeing: c++ slower than any good coverage flavor of Ruby.

headius · on July 19, 2010

I'd like to see that too. In JRuby, we may be able to get Ruby-to-Ruby calls to perform as well as Java calls, which would at least get that bottleneck out of the way. The remaining performance issues, however, are usually the rate at which objects can be allocated. In order to reduce that we may need a little JVM help (escape analysis that works well enough to actually eliminate allocations) and we may start to explore optional static typing, to allow really reducing numeric operations to raw primitive math.

At this point, we realize that sometimes you really do need native performance, and we're not taking any options off the table to get there.

messel · on July 25, 2010

Charles, is there any chance virtual machines, JIT compilers can get "smart enough" to introduce native typing without requiring a hard decision by a developer. I'd prefer not to have to worry about types ever, and let the interpreter/compiler/optimizer slap them onto objects as needed to really crank numeric throughput.

Maybe I could tune software with profiling tools after the fact. But it feels like as soon as you start locking specific objects down it's a slippery slope.

I'm not very familiar with the depth you've gone to, but can a dynamic object have numeric features or be capable of substituting numeric handling for a limited time (virtual numerics) and then revert back to a sloppy untyped object?