For the base perf numbers, I'm not surprised. We've known that we're roughly on par with 1.9.2 for some time, and many of the benchmarks in question have started to reach a point of irreducible complexity (e.g. you can only slice and dice strings so fast). It's good to see JRuby remains at or near the front of the pack as far as performance, especially considering we've made no major performance-related changes in almost two years.
The Linux versus Windows numbers are a bit surprising to me. If I were to make a guess, I'd guess that the JVMs used were not identical (perhaps like Isaac Gouy mentions, one platform was 64-bit and the other wasn't) or some other detail altered the performance characteristics of the test. But the performance drop does seem to be in line with other implementations, so perhaps Windows really does suck and there's not much we can do about it.
On the memory issue, I have a few recommendations.
JRuby by default allows the JVM to use up to a 512MB heap (the default is usually 32-64MB, which is rarely enough for most nontrivial apps). The JVM likes to use as much memory as you're willing to give it, to keep GC times low (nearly free) and to give it lots of room to breathe. Almost all these benchmarks could run in far less memory (maybe 1/5 as much or lower) if the JVM were choked down to that level. So it's not surprising to me that the memory sizes for these very object and CPU-intensive benchmarks start to approach that 512MB limit; the JVM is just stretching its legs.
Also expect to see more work on picking a "winner" as far as lightweight servers go. Something that works as seamlessly as Passenger could be the "last mile" we need to get folks to make a move.
And watch our two Ruby Summer of Code projects: Ruboto, bringing JRuby to Android; and C extension support.
We're working very hard to bring JRuby to everyone and everyone to JRuby. The reasons not to use JRuby are rapidly disappearing.
> The Linux versus Windows numbers are a bit surprising to me. If I were to make a guess, I'd guess that the JVMs used were not identical (perhaps like Isaac Gouy mentions, one platform was 64-bit and the other wasn't) or some other detail altered the performance characteristics of the test.
The same identical version was used: Java HotSpot(TM) 64-Bit Server VM 1.6.0_20.
> But the performance drop does seem to be in line with other implementations, so perhaps Windows really does suck and there's not much we can do about it.
I don't know that this theory totally holds water (Windows OS developer here, announcing obvious bias) - looking at the results, the one place Windows really gets nailed is the I/O test; I suspect there's some significant optimizations that could be done there, as I/O on Windows OS in general certainly isn't 4x slower than Linux.
If there's something we or the JVM could do to improve these numbers, I would love to talk to you about it. At this point, if the JVM developers haven't found the magic sauce, we JRuby guys probably won't either...but I'd really love for JRuby performance on Windows to match JRuby performance on Linux.
I don't think that I can actively hack on the JVM, but I can tell you that XPerf is a great way to determine where you're spending your time. http://www.microsoftpdc.com/2009/CL16 is a really good intro video on the topic, it's a very powerful tool (though doesn't provide as much analysis as Instruments for example)
Tooting my own metaphorical horn a bit, but it was relatively easy to put together a Jetty-Rack interface for high-performance webapps. It's called Mizuno, and lives at http://github.com/matadon/mizuno
Internally it uses Jetty's event-driven I/O, so performance is on a par with Thin and Passenger, at least on my MacBook.
I'm a bit pressed for time at the moment, so if anybody wants to add in the cometd servlet (it's in the repos) and write a tutorial, that'd be awesome, and if not, I'll get to it in a few weeks.
JRuby would be the future if the startup times weren't so horrendous. It makes it very unpalatable for scripting. Nailgun is not robust enough. I have a feeling this is going to be more of a JVM problem than something Nutter and team is going to tackle. So is the future Rubinius and JRuby?
For scripting, that's definitely a problem. But it's not a big deal for long-running processes (e.g. a web application server). But Ruby definitely benefits from both: a fast Ruby with ~zero startup time, and a fast Ruby on the JVM.
3 years ago, Ruby implementation problems became the hot topic at Ruby conferences. Ruby was a beautiful language with an ugly implementation. It's great to see that the community has executed on this concern.
While I agree that it's not important for long-running applications, there is still a lot of scripting done in Ruby, and it's very frustrating to get a a 5-to-10 second wait just to tell someone their arguments were invalid. Waiting that long for tests is also such a drag on iteration.
I think once JRuby has stabilized the native gem support and has some sort of Passenger-like deployment option (Glassfish and Warbler isn't quite there yet), we'll see some big players built on MRI start to move their Web applications to the platform.
A large part of startup time is out of our hands, but we continue to look for workarounds.
Most current startup slowness is due to the JVM itself running slow during the first few seconds of execution. Ruby scripts execute as they boot, which means we have to parse and run them. But until the JVM's been up for a little while, nothing in JRuby itself has even JITted to native code...and interpreted JVM bytecode runs even slower than Ruby on most of my tests.
The other complicating factor is the fact that Ruby applications run so much code on boot. RubyGems, for example, degrades startup time O(n) based on how many gems you have installed. Hacks like faster_rubygems (gem install faster_rubygems) help, but changes are needed in RubyGems proper.
The tradeoff is that you get a much more stable runtime, true multithreading, and the ability to play ball with everything in the Java ecosystem.
Sure, it takes longer to start up, but we're talking a few seconds here, which is nothing for a long-running webapp. Plus, the JVM is rock-solid, has very predictable memory usage, and also has tons of monitoring infrastructure to bring to bear against problems.
I'm much, much more concerned about the memory usage of JRuby than the startup times. For toss-off scripts I can keep using MRI.
I've read that JRuby memory usage scales better than MRI though, so you pay a higher cost upfront but can actually accommodate more workers on the same hardware. I haven't gotten far enough in my exploration of JRuby to see if this is the case in my own apps.
Hats off to all the devs working on bringing us a better Ruby runtime. I realize it's a lot of work.
In my tests, it's still worse than MRI. Memory-wise, the best between 1.8.7, 1.9.1, and JRuby 1.5 was still 1.8.7. Of course, if this is the difference between storing stuff in memory and loading from disk, 1.8.7 would be faster by orders of magnitude. The ability to use stable, high-performance JVM libraries trumps this concern in many cases though.
It's also worth pointing out that while a JRuby/Rails instance might take 100-200MB, that's all you need to scale a site across pretty much any number of cores. MRI and REE both need to spin up multiple processes to handle concurrent requests, so very quickly the JRuby memory size becomes a tremendous win (think of 25-50 MRI instances using 20-50MB of memory each...you get the picture).
Yeah that's what I'm hoping to see as I transition a couple of test apps over. The scala/lift folks see this as a big advantage for their JVM web stack too.
Yeah I see that. My tests primarily revolved around single-threaded event driven code, and retaining large container objects (millions of objects). MRI was much more efficient. I understand that there's a lot more going on under the hood in JRuby just to make this possible, but for this use case, the result is what mattered. Although we ended up using Tokyo Tyrant (with a hack to pre-disk-cache all the data) anyway.
Object sizes in JRuby are certainly larger, especially if you run on a 64-bit JVM which necessarily has 64-bit reference fields for all object references (basically everything in Ruby). It doesn't surprise me at all to see a single-threaded case with a lot of objects use more memory, but it would be interesting to see how much of that was unused heap space and how much was actually live data. MRI's less-efficient conservative collector can live in a smaller memory space, but you sacrifice performance.
It's definitely a JVM problem, but contrary to Java, there is a sort of REPL available. Instead if writing a file, modifying it and running 'jruby my-script.rb', just issue 'load myscript.rb' in your (j)irb. That makes for much faster iterations.
I have both MRI and JRuby installed on my systems. For quickie scripts I use 'ruby' and for anything with a noticeable runtime (Or multi-core processing!) I use 'jruby'. I'm sure the JRuby fellows will continue working on startup time, but for me at least, it's not an issue.
I think having a Ruby toolbox instead of a single big hammer is a win for everybody.
We hear this a lot. Hopefully we can keep improving startup times, but when it's possible to use either Ruby or JRuby, it makes a pretty good combination.
It's worth noting that startup times with 1.9 are longer than with 1.8 (in my experience anyway). Startup times for my 1.8 instances on Heroku are sub-second. Whereas instances running 1.9 are 6+ seconds!
Disclaimer: I haven't dug into this issue much, so its possible other factors are the cause...
InvokeDynamic and other Java 7 features (method handles, NIO2) are definitely going to improve JRuby's situation, but maybe not in the way you expect. Indy and method handles will largely allow us to delete (or not load) code we currently generate to get the same effect. Smaller runtime, possibly a smaller distribution. NIO2 will give us more direct access to streams, process handling, and so on, so we can delete hacks we've written to do all that ourselves.
But perhaps the most interesting aspect of any new major Java release is that they're usually 10-20% faster across the board, due to new and better optimizations. As the saying goes, if your Java app isn't fast enough, upgrade to a newer JVM.
I think invokedynamic combined with more runtime profile-driven optimizations in JRuby could easily double JRuby's Ruby-execution performance (or better), and in many cases reduce memory churn too. Lots of good things coming...it's nice to have an army of VM engineers on your side.
Why no IronRuby test on Windows? Understand, I'm no fan of M$, but it seems very odd to run the IronRuby test on Mono, then omit running IronRuby on the "native" .NET environment.
Means and (especially) medians mean little to nothing when you're doing them over entirely different benchmarks. Why bother? A 3D plot would be a lot clearer.
> Means and (especially) medians mean little to nothing
It's the other way around. Given the skewed nature of the data, the median is a better measure of the central tendency of the data set.
But note that I'm not plotting the median only. I use a box plot which gives you a much better statistical picture than using a simple median (or mean). Nevertheless, summarizing different benchmarks has its limitations and box plots give you a rough, general idea of what's going on.
I disagree. If you write ten benchmarks, and they finish in the same order on all platforms, the median really doesn't give you any information apart from how the middle one did. At least the mean will be changed if the slowest benchmark for some reason is a lot slower on one platform. In this case, there was some variation, but for the most part it looks like the median is just one of 4 or 5 middling benchmarks in all cases, so you're really just removing data, if anything.
The box plot is a good choice here. What I would like to see is a graph plotting time against {VM} x {benchmark size} to see how each platform scales within a benchmark.
With the data you've got, apart from confusing me a little by including the mean and median, you did a good job presenting it. I'd love to see why some platforms do so much well than others in some cases (except JRuby which everyone knows is a memory hog), but this seems way out of scope for this article. :)
Did you normalise the results before plotting them and calculating means and medians? Or is it a summary of the raw data? I could probably find out myself, but since you are posting here, I thought I'd ask.
I'd like to see that too. In JRuby, we may be able to get Ruby-to-Ruby calls to perform as well as Java calls, which would at least get that bottleneck out of the way. The remaining performance issues, however, are usually the rate at which objects can be allocated. In order to reduce that we may need a little JVM help (escape analysis that works well enough to actually eliminate allocations) and we may start to explore optional static typing, to allow really reducing numeric operations to raw primitive math.
At this point, we realize that sometimes you really do need native performance, and we're not taking any options off the table to get there.
Charles, is there any chance virtual machines, JIT compilers can get "smart enough" to introduce native typing without requiring a hard decision by a developer. I'd prefer not to have to worry about types ever, and let the interpreter/compiler/optimizer slap them onto objects as needed to really crank numeric throughput.
Maybe I could tune software with profiling tools after the fact. But it feels like as soon as you start locking specific objects down it's a slippery slope.
I'm not very familiar with the depth you've gone to, but can a dynamic object have numeric features or be capable of substituting numeric handling for a limited time (virtual numerics) and then revert back to a sloppy untyped object?
For the base perf numbers, I'm not surprised. We've known that we're roughly on par with 1.9.2 for some time, and many of the benchmarks in question have started to reach a point of irreducible complexity (e.g. you can only slice and dice strings so fast). It's good to see JRuby remains at or near the front of the pack as far as performance, especially considering we've made no major performance-related changes in almost two years.
The Linux versus Windows numbers are a bit surprising to me. If I were to make a guess, I'd guess that the JVMs used were not identical (perhaps like Isaac Gouy mentions, one platform was 64-bit and the other wasn't) or some other detail altered the performance characteristics of the test. But the performance drop does seem to be in line with other implementations, so perhaps Windows really does suck and there's not much we can do about it.
On the memory issue, I have a few recommendations.
JRuby by default allows the JVM to use up to a 512MB heap (the default is usually 32-64MB, which is rarely enough for most nontrivial apps). The JVM likes to use as much memory as you're willing to give it, to keep GC times low (nearly free) and to give it lots of room to breathe. Almost all these benchmarks could run in far less memory (maybe 1/5 as much or lower) if the JVM were choked down to that level. So it's not surprising to me that the memory sizes for these very object and CPU-intensive benchmarks start to approach that 512MB limit; the JVM is just stretching its legs.
Expect to see a lot more performance work coming in JRuby 1.6. I've blogged about it here: http://blog.headius.com/2010/05/kicking-jruby-performance-up...
Also expect to see more work on picking a "winner" as far as lightweight servers go. Something that works as seamlessly as Passenger could be the "last mile" we need to get folks to make a move.
And watch our two Ruby Summer of Code projects: Ruboto, bringing JRuby to Android; and C extension support.
We're working very hard to bring JRuby to everyone and everyone to JRuby. The reasons not to use JRuby are rapidly disappearing.