Ruby 2.6.0-preview2 released with JIT

pjungwir · on June 3, 2018

Congratulations Ruby team! I'm excited to hear about the performance improvements. It seems like JIT could be a huge win since calling a method requires checking all kinds of possibilities that are almost always not used, but might be. I would love to keep all the power & fun of Ruby without worrying so much about performance.

Speaking of optimizing method calls: now that it's been a few years, I wonder what Ruby folks think about refinements. Are you using them? Are they helpful? Horrible?

I remember reading from the JRuby folks that refinements would make Ruby method calls slower---and not just refined calls, but all method calls [1], although it sounds like that changed some before they were released [2]. It seems like people stopped talking about this after they came out, so I'm wondering if refinements are still a challenge when optimizing Ruby? I guess MRI JIT will face the same challenges as the Java implementation?

[1] http://blog.headius.com/2012/11/refining-ruby.html

[2] https://github.com/jruby/jruby/issues/1062

transfire · on June 4, 2018

As the creator of Ruby Facets, refinements were high on the todo list. Unfortunately the syntax of refinements led to an ugly problem... I would have to support two identical code bases to support both usages (monkey patching and refinement), the only difference being boilerplate code. I could find no way around it. That seemed ludicrous to me, and so I never bothered to add refinement support. Hence, from my perspective, refinements went over like a led balloon.

wasd · on June 4, 2018

I've never heard of Ruby Facets but it looks interesting. Just as a heads up, the api document links on http://rubyworks.github.io/facets/learn.html don't work.

Rafert · on June 4, 2018

I feel refinements are best suited for libraries. E.g. compare this improvement to the CSV gem: https://github.com/ruby/csv/pull/30 which defines it only for itself, compared to ActiveSupport defining it globally: https://github.com/rails/rails/commit/575dbeeefcaafeb566afc0...

chrisseaton · on June 3, 2018

Refinements don't add any peak performance overhead in their final implementation.

pjungwir · on June 3, 2018

That is great to hear! I'd love to read the details if you happen to have a link to more information. I've tried Googling for it a few times over the years but never found anything.

chrisseaton · on June 3, 2018

I haven't done any work to implement refinements myself, but I do know that basically it came down to something that you could write an inline cache against, guarded against the class version, so a simple word guard that you were doing anyway when you called any method.

cpuguy83 · on June 4, 2018

Have you tried jruby?

hirundo · on June 3, 2018

> Add a new alias then to Kernel#yield_self. [Feature #14594]

It might seem strange that they lead with this new feature, but `yield_self` can greatly improve the Ruby chainsaw, and `then` makes it more accessible.

The style of writing a Ruby method as series of chained statements has a non-trivial effect on readability and conciseness. `then` lets you stick an arbitrary function anywhere in the chain. They become more flexible and composable, with less need to interrupt them with intermediate variables that break the flow.

I've been using `ergo` from Ruby Facets for years ... largely the same thing ... and the more I used it the more readable I find my old code now. Funny how adding one very simple method can have more effect than so many other complex, high effort changes.

atombender · on June 3, 2018

Good article about yield_self: https://zverok.github.io/blog/2018-01-24-yield_self.html

rocketbop · on June 3, 2018

I really would have preferred `pipe` as an alias.

sho · on June 3, 2018

> We’re going to implement method iniling in JIT compiler, which is expected to increase Ruby’s performance in order of magnitude

An order of magnitude as in .. 10x? This seems too good to be true. Half the arguments against Rails melt away like butter if that's truly the case.

Anyone with a better understanding of the details care to comment on the likelihood of these performance gains being actually realised, and if not, what we might realistically expect?

jashmatthews · on June 3, 2018

Sinatra + Sequel is already very competitive in web performance with Go + Gin[1]. It's the Rails convenience stuff which slows things down massively. MJIT could probably bring Rails in line with Sinatra though.

Between Ruby 1.8 and 2.5, performance has improved around 13x in tight loops[2]. The Rails performance issue has been massively overblown since 1.9 was released.

Ruby 1.8 was a tree walking interpreter, so the move to a bytecode VM in 1.9 was a huge leap in performance. Twitter bailed to the JVM before moving to 1.9. A lot of those 10-100x performance differences to the JVM are gone thanks to the bytecode VM and generational GC.

Bytecode VMs all have the same fundamental problem of instruction dispatch overhead, they're basically executing different C functions depending on input.

Doing _anything_ to reduce this improves performance dramatically, even just spitting out the instruction source code into a giant C function, compiling it, and calling that in place of the original method. Another 10x improvement on tight loops should not be a problem.

[1] https://www.techempower.com/benchmarks/#section=data-r15&hw=...

[2] https://github.com/mame/optcarrot/blob/master/doc/benchmark....

chanks · on June 3, 2018

And Roda (https://github.com/jeremyevans/roda) beats Gin, if you include it: https://www.techempower.com/benchmarks/#section=data-r15&hw=...

meesterdude · on June 3, 2018

I would have never thought a ruby stack would come anywhere close to Go performance. The path to optimization used to mean abstracting the really crazy parts into a Go microservice for things that just needed absurd responsivity; but it's clear now that a slim ruby stack could also be very effective - and without needing to learn a new language. Worthwhile to at least explore before going Go.

Nor did i know that twitter jumped out of rails before ruby got performant. Which means the argument that twitter outgrew rails isn't so correct anymore.

still, thanks for this insightful comment.

ksec · on June 4, 2018

>Nor did i know that twitter jumped out of rails before ruby got performant. Which means the argument that twitter outgrew rails isn't so correct anymore.

Twitter, even back in those days would have still outgrown today's Rails. It was Ruby that has gotten a lot faster. Not necessarily Rails.

jashmatthews · on June 4, 2018

From what I understand, the biggest issue was their product required fast fan out messaging. Tumblr, for example, is still huge but can get away with 1000 lines of PHP for their feed: https://news.ycombinator.com/item?id=17154403

codinghorror · on June 4, 2018

> It's the Rails convenience stuff which slows things down massively.

Yeah no kidding.. https://samsaffron.com/archive/2018/06/01/an-analysis-of-mem...

ksec · on June 4, 2018

Or Specifically Active Record. Are we going to get Turbo Record soon? :)

jashmatthews · on June 4, 2018

ActiveSupport also monkey-patches to_json with a recursive Ruby function, completely nerfing JSON performance by 50x in a lot of cases. Unfortunately, it's what makes 'render json: @model' just... work.

vinceguidry · on June 4, 2018

Any time I personally ever want to generate JSON in Ruby, I want recursive tree traversal. It's just too painful to expect primitive types on everything. You'd have to pay that 50x doing it explicitly with a recursive function before you turn it into JSON anyway.

Ideally, you'd serialize directly from the database, bypassing the application entirely. Easily doable in ActiveRecord, but it's an explicit action, not the default. Not even sure if it's available in other databases besides PostgreSQL.

jashmatthews · on June 4, 2018

It's not required at all! Rails adds a completely avoidable massive overhead because of the way it overrides to_json see: https://twitter.com/jashmatthews/status/967423661908070401

vinceguidry · on June 4, 2018

Have you ever used Ruby's JSON.dump? The second it runs into anything that it can't convert, you get "#<Object:0x00007fab05133078>" everywhere in your output.

jashmatthews · on June 4, 2018

Of course! We use Oj.dump in Rails compatibility mode in production at ChartMogul.

sho · on June 3, 2018

Thanks for the informative response (and sudhirj too)! Haven't looked at that benchmark for a while - very interesting. Sinatra is absolutely killing it. I knew it was faster than Rails but not that much faster.

With 2.6 and sorbet [1] coming down the line, it's exciting to be a Rubyist again!

[1] https://sorbet.run/

foldr · on June 3, 2018

>Doing _anything_ to reduce this improves performance dramatically

It does if you ignore the overhead of JIT compilation itself. However, my understanding is that writing a JIT implementation that performs better than a good interpreter is surprisingly difficult. You have to have a lot of complicated logic for tracking hotspots and using JIT judiciously in short-running scripts.

jashmatthews · on June 3, 2018

Hmmm, not really. I think that's somewhat true for JS and webpages, but they're not the same as server-side apps.

It's the instruction dispatch overhead that's the real unavoidable problem. LuaJIT, for example, uses a bunch of tricks to minimize it in the bytecode VM, and it's significantly faster than the standard Lua VM but still far, far slower than basic JIT compilation.

foldr · on June 3, 2018

Right, but historically there are lots of instances of projects that abandoned JITs because they didn't get a performance improvement. JIT compilation reduces instruction dispatch overhead, but it also, unless accompanied by sophisticated profiling techniques, adds the overhead of JIT compilation time, which can easily swamp the improvements.

Lua JIT is one of the most sophisticated dynamic language JITs out there, so it's hardly evidence that a simple implementation of a JIT will perform better than a good bytecode interpreter.

The problem is less acute for server side apps because the programs run for a long time, so that the initial compilation overhead is insignificant. However, there's a reason that you need a JIT to make Ruby fast rather than an ahead of time compiler. Ruby has so few compile-time guarantees that you need to do a lot of dynamic specialization to get really significant performance improvements. So compilation might still be triggered even after a script has been running for a long time.

I'd add that PyPy, which is also very sophisticated, is often not much faster than CPython, and in fact is slower for some types of code. Writing good JIT-based implementations for dynamic languages is really a tough problem. See e.g. the following post for some explanation of why:

http://faster-cpython.readthedocs.io/notes_2017.html

jashmatthews · on June 3, 2018

> Right, but historically there are lots of instances of projects that abandoned JITs because they didn't get a performance improvement. JIT compilation reduces instruction dispatch overhead, but it also, unless accompanied by sophisticated profiling techniques, adds the overhead of JIT compilation time, which can easily swamp the improvements.

Yes.

> Lua JIT is one of the most sophisticated dynamic language JITs out there, so it's hardly evidence that a simple implementation of a JIT will perform better than a good bytecode interpreter.

I meant that even a basic JIT can offer the same speedup as LuaJIT's interpreter, and a lot more work went into the latter.

> The problem is less acute for server side apps because the programs run for a long time, so that the initial compilation overhead is insignificant. However, there's a reason that you need a JIT to make Ruby fast rather than an ahead of time compiler. Ruby has so few compile-time guarantees that you need to do a lot of dynamic specialization to get really significant performance improvements. So compilation might still be triggered even after a script has been running for a long time.

The initial results of MJIT for simply removing the instruction dispatch overhead and doing some basic optimizations are a 30-230% performance increase on a small but real-world benchmark. No type specialization and specular optimization required.

> I'd add that PyPy, which is also very sophisticated, is often not much faster than CPython, and in fact is slower for some types of code. Writing good JIT-based implementations for dynamic languages is really a tough problem. See e.g. the following post for some explanation of why:

Most of the discussion about PyPy is completely irrelevant for the discussion about MJIT. PyPy isn't a method JIT. PyPy traces the interpreter itself and tries to produce a specialized interpreter. It works even worse at optimizing Ruby code via Topaz.

chrisseaton · on June 4, 2018

> It works even worse at optimizing Ruby code via Topaz.

Topaz was easily the fastest Ruby JIT before TruffleRuby, beating the JRuby and Rubinius JITs. It was very impressive.

jashmatthews · on June 4, 2018

True! I just checked again and Topaz is indeed almost twice as fast as CRuby on optcarrot. I think I got it mixed up with the non-JIT Rubinius numbers.

It'a shame Topaz was never really "finished".

chrisseaton · on June 4, 2018

One of the main Topaz developers now works on the Python equivalent of TruffleRuby.

shellac · on June 4, 2018

You mean a python implementation on truffle? Or ruby on something truffle-like on RPython?

chrisseaton · on June 4, 2018

The former.

shellac · on June 4, 2018

This is the answer I wanted :-)

foldr · on June 4, 2018

>The initial results of MJIT for simply removing the instruction dispatch overhead and doing some basic optimizations are a 30-230% performance increase on a small but real-world benchmark. No type specialization and specular optimization required

So, this amounts to a small improvement for some types of code. Indeed, it is "easy" to get that by "just" using some basic JIT techniques. The trick is to get consistently better performance across the board. Relevant tweet at https://medium.com/@k0kubun/the-method-jit-compiler-for-ruby...:

>I've just committed the initial JIT compiler for Ruby. It's not still so fast yet (especially it's performing badly with Rails for now), but we have much time to improve it until Ruby 2.6 (or 3.0) release.

jashmatthews · on June 4, 2018

> The trick is to get consistently better performance across the board.

This will come with the rest of the opimizations Takashi has planned for Ruby 2.6. Ruby-Ruby method inlining, which is almost finished, is a huge one for improving Rails performance. IMHO there's no real point talking about Rails until it's working in some form.

> >I've just committed the initial JIT compiler for Ruby. It's not still so fast yet (especially it's performing badly with Rails for now), but we have much time to improve it until Ruby 2.6 (or 3.0) release.

It turned out this wasn't even testing MJIT with Rails because https://twitter.com/samsaffron/status/963219086833434624

foldr · on June 4, 2018

I'm not saying that MJIT (or whatever implementation) will never be fast. I'm just saying that in general, it is not trivial to get performance improvements by writing a JIT. My original comment said nothing about Ruby.

jashmatthews · on June 4, 2018

This is simply not true. You're massively underestimating the overhead of a bytecode VM. Even the most optimal bytecode VMs are easily beaten by the simplest of JITs in 100 lines of C: https://arxiv.org/pdf/1604.01290.pdf

Code doesn't even need to be "hot" to make it worth it. WebKit switches from interpreter to cheap baseline compilation, without an specular optimizations or type information, after only 6 calls of a function: https://webkit.org/blog/3362/introducing-the-webkit-ftl-jit/

This article literally describes how baseline JIT is worth it simply to remove the bytecode VM dispatch overhead.

foldr · on June 4, 2018

It's not clear to me why you think that the linked article shows that naive JITs perform better across-the-board than well-written bytecode interpreters. The reported speed improvement from JIT itself is a mere 2.3x, and the listed benchmarks mostly involve numerical code, where JIT tends to be effective.

Relevant post here:

https://rfk.id.au/blog/entry/pypy-js-faster-than-cpython/

A simple JIT can get you to the point where you reliably outperform a bytecode interpreter for certain types of code. What takes a lot more engineering effort is reliably performing at least as fast as a bytecode VM for all types of code.

jashmatthews · on June 4, 2018

MJIT is completely different to PyPy. Their problems are simply not relevant to MJIT. MJIT is already just as fast as standard CRuby when executing complex Rails code, minus the small overhead for JIT compilation: http://engineering.appfolio.com/appfolio-engineering/2018/3/...

PyPy is a much more ambitious design, completely replacing CPython, and using an unusual JIT scheme of tracing the interpreter itself and trying to produce an interpreter optimized for particular traces of your code.

It was much harder for that approach to reach the same level of general performance than it seems to have been for CRuby & MJIT.

funkaster · on June 3, 2018

Yes, I agree with this. We use Sinatra + Sequel, but run our code using JRuby mostly because we're sharing some scala libs with other teams. In any case for us, performance has not been a problem (~20k req/min with one node). I'm really looking forward to ruby 3x3 :)

jashmatthews · on June 3, 2018

JRuby is really hitting its stride now, becoming ~3x faster than CRuby in my testing. The JVM changes to support more dynamic languages have improved performance so much.

Charles Nutter's early tests using JRuby on the GraalVM sound like there's another big step in performance coming without a huge amount of work.

dcu · on June 4, 2018

Ruby's GIL will get in your way when working with real world apps. Also in this recent run gin doubles Sinatra's performance

https://www.techempower.com/benchmarks/#section=test&runid=a...

jashmatthews · on June 4, 2018

Interesting! Thanks.

CRuby's GIL doesn't really matter for serving web requests since it's run with one process per core like NodeJS is. It's less memory efficient but doesn't really affect throughput so much. Also, JRuby has no GIL.

himom · on June 3, 2018

I read from benchmarks Iris was the fastest web framework ever. How does Gin compare to it?

Another big win is the bootsnap gem, which is a cache of previous VM runs that loads faster than parsing all invariant pieces of code again.

https://github.com/Shopify/bootsnap

jashmatthews · on June 3, 2018

Is Idris fully featured? It's better to compare two frameworks that have real production use in more complex apps to get an idea. Both Gin and Sinatra fit this description.

I haven't had a chance to use Bootsnap yet but it sounds really promising.

Rapzid · on June 9, 2018

Ruby has made huge strides to be sure. However this is a bit hyperbolic IMHO.

Golang plus gin, sure. However there are other Go frameworks on the charts that blast the Ruby competition out of the water. Ruby isn't really on the podium at all with C, C++, Rust, Golang, C#, and Java about an order of magnitude out in the lead on fortunes.

Martini isn't much of a framework itself either, so lets forget the full featured nonsense. Almost none of the ecosystem is in play with these benchmarks. You could build a system up around fasthttp just as well as net/http, and ASP.NET certainly can't be accused of being a for-purpose contender.

The most impressive thing IMHO is how well Ruby is doing on maximum latency. I can't quite reconcile that considering fasthttp is pretty much zero-allocation and golangs stop the world is in the microseconds.. Pretty impressive.

jashmatthews · on June 9, 2018

> The most impressive thing IMHO is how well Ruby is doing on maximum latency. I can't quite reconcile that considering fasthttp is pretty much zero-allocation and golangs stop the world is in the microseconds.. Pretty impressive.

Fast GC is critical to Ruby performance so a ton work went into it. Ruby 2.2+ has a very short STW phase thanks to generational GC + incremental marking.

brightball · on June 3, 2018

What I remember reading from the Graal folks a while back was that Rails performance issues revolves around the amount of object creation and destruction.

advanced__pizza · on June 3, 2018

Thanks for this detail! Was just starting to look at Go... Had no idea Sinatra was that fast.

sudhirj · on June 3, 2018

For hot code, probably, not the entire language in general. If you're looping through a million numbers doing the same calculation, or maybe rendering markdown in a loop an a lot of text, might hit 10X - the JIT will essentially write the code in C for you, then compile it and run it instead of Ruby.

jashmatthews · on June 3, 2018

Another factor is a lot of the hot code, like JSON serialization, DB clients, HTTP parsing, is all already in native C extensions. Performance there is only going to improve a little, although some can be better off as pure Ruby + JIT.

himom · on June 3, 2018

Rubinius uses LLVM IR for this. The only problem is rbx is still very slow on startup and interactive use.

Ruby (MRI) will have to reinvent the wheel in order to get a panoply of optimizations that some very smart people have already baked in: like the ability to target almost any platform from the same library.. GCC requires cross-compiling per target.

jashmatthews · on June 3, 2018

Didn't Rubinius deprecate their JIT due to a torrent of issues and bugs and have not yet replaced it? Or am I out of date?

chrisseaton · on June 3, 2018

Yes Rubinius hasn't had a JIT for years.

_x5md · on June 4, 2018

Just spun up my ridiculously heavy rails app and did a quick test with one of the more cpu-intensive actions and see no evidence of any improvement at all. If anything, it was a bit less predictable. This little test was nowhere near an actual benchmark, but I wouldn't hold my breath.

Test suite still passes on it though, so upgrading shouldn't be a huge deal at least へ‿(ツ)‿ㄏ

virtualwhys · on June 4, 2018

@darkdimius[0], the person who tweeted the referenced tweet in the article, was arguably the primary contributor to Dotty[1] behind Martin Odersky (Dotty will be Scala 3, the next major version of the language).

Once he finished his doctorate at the EPFL, off to Stripe he went, bye bye Scala. Tough industry, on the one hand Scala benefits from a revolving door of high level EPFL doctoral students, and on the other the talent pool shifts around as students come and go.

Money talks, companies like Stripe have a leg up in that they can fund full-time engineers to work on projects, whereas institution backed projects typically have a much smaller pool of long-term engineers to rely on (JetBrains, for example, has something like 40 full-time engineers working on Kotlin/KotlinJS/Kotlin Native).

[0] https://github.com/DarkDimius [1] https://github.com/lampepfl/dotty

alberth · on June 3, 2018

What’s the current sentiment as for what will ultimately lead to the best performance of executing Ruby?

From my uneducated perspective, seems like Graal VM could become the de facto Ruby deployment stack.

https://github.com/oracle/truffleruby

onli · on June 3, 2018

I doubt that there are that many people that would allow Oracle tech to enter into a beloved language like Ruby, or the ecosystem.

mixedCase · on June 4, 2018

Where you're seeing Oracle tech I see free software licensed under the LGPL. There's no way that if TruffleRuby becomes well and truly popular and Oracle decides to... be Oracle, no other company would pick up the banner. Ruby is just too popular for that not to happen.

ksec · on June 4, 2018

Java, or OpenJDK is fully open sourced and GPL. I asked multiple times what are the risk in using it. And no one ever had an answer for it.

If the communities doesn't like where things are going, they could fork the whole thing and call it something else, like Coffee.

alberth · on June 4, 2018

Why not?

Graal is EPL, GPLv2, LGPL licensed.

https://news.ycombinator.com/item?id=16862130

onli · on June 4, 2018

Because Oracle is Oracle, the most evil company in tech, the one most blatantly greedy. Look at what they pulled with Google. Oracle would wait till the tech usage grows and then use patents and API copyrights or whatever else they invent out of thin air to go after the players using its tech. The free license of Graal does not protect you from that, GPL2 specifically not. See https://www.gnu.org/licenses/rms-why-gplv3.en.html.

pjmlp · on June 4, 2018

Google abused Sun, took advantage that they were in a critical financial situation not able to sue, and when they crashed, did not move a finger to rescue the company assets.

Now Android has Google's own J++, limiting what kind of Java libraries are portable to the platform.

At the same time, some OEMs are adopting Android instead of Embedded Java, thus increasing the fragmentation about what Java libraries are actually portable.

Google just though they could let Sun close doors and get away with how they created their own J++.

onli · on June 4, 2018

That's Oracle propaganda. Sun was perfectly happy with Google using Java and the free entry into the mobile market they got from that. See https://www.zdnet.com/article/a-google-android-and-java-hist..., https://www.zdnet.com/article/sun-ceo-explicitly-endorsed-ja....

pjmlp · on June 4, 2018

That is Google propaganda.

Sun did what they could to save their face.

"Triangulation 245: James Gosling"

https://www.youtube.com/watch?v=ZYw3X4RZv6Y&feature=youtu.be...

Also doesn't change the fact that even with Android 8.1, I as Java developer cannot take a random jar from Maven Central and be certain it won't crash and burn on Android, regardless of the version.

onli · on June 4, 2018

Good link.

But I don't see how you can tolerate that contradiction. Either you agree with Oracle that the Java APIs were copyrighted and Google should not have been allowed to reconstruct them. Or you worry about fragmentation coming from an incompatible Java implementation. Doing both is nonsensical.

pjmlp · on June 4, 2018

What contradiction?

Google should have paid Sun instead of playing a Microsoft's move fostering Sun's downfall, period.

And in doing so, Android would have been JavaSE compliant plus whatever additional libraries they would think to drop on top of it.

onli · on June 4, 2018

How do you come to the idea that having more Java devices available, even if not 100% compatible, would have in any way caused harm to Sun? And then even that much that it killed the company?

On top of that, I don't see for what google would have had an obligation to pay.

Contradiction: Google broke some imaginary copyright by re-implementing APIs, but Google is bad because the re-implementation was not 100% equal to the original causing fragmentation. Either the fragmentation was harmful, then the API copyright was the problem. Or the API copyright violation was the problem, then fragmentation was the explicit goal and Google's try to minimize it the problem. Both can't be true at the same time outside lawyer lala land.

pjmlp · on June 4, 2018

Because those devices run Android Java, which Sun saw $0, thus not able to capitalize on it to pay their bills.

1 - Google did not pay for Java licenses, when it should. Even Andy Rubin admits that on his emails.

2 - To this day Android is not Java SE compliant, thus creating a fragmentation between Android Java and Java. Just like Sun managed to prevent with J++

3 - Being a Java license as Google avoided to be, and still isn't (many Java APIs are not yet available on Android), would have required Android to be fully Java SE compliant

So to conclude, Google tricked Sun and fragmented the Java eco-system.

They should pay and provide a 100% Java SE compliant implementation, or be honest about it and fully migrate to Kotlin, Dart or whatever they feel like it.

nirvdrum · on June 4, 2018

FYI, what you've linked to is a breakdown for the licenses of various projects comprising GraalVM. The licenses you listed only apply apply to TruffleRuby.

Having said that, Graal and its related projects are all open source, with a license listing available in its README:

https://github.com/oracle/graal/blob/master/README.md

sandGorgon · on June 3, 2018

Same question here. Will truffle also be benefited with the changes made in 2.6 ?

wwarner · on June 3, 2018

A standardized AST! That's a big win.

twelvechairs · on June 3, 2018

I would agree but concerned on this

> Compatibility of the structure of AST nodes are not guaranteed.

Not sure if it means its going to be any more stable/complete than ruby_parser / ruby2ruby

ksec · on June 4, 2018

There is another announcement from RubyKaigi 2018 related to performance.

Rubex - A Ruby-like language for writing Ruby C extensions.

https://github.com/SciRuby/rubex

himom · on June 3, 2018

I wish though, that except for eval and exec, that the majority of Ruby code could get an incremental/LTO JIT similar to HotSpot and a GC like C4 rather than an inc gen GC.

blattimwind · on June 3, 2018

> Unlike ordinary JIT compilers for other languages, Ruby’s JIT compiler does JIT compilation in a unique way, which prints C code to a disk and spawns common C compiler process to generate native cod e.

Oh dear god.

tbodt · on June 3, 2018

What's the advantage over using LLVM's built-in JIT, or PyPy's JIT, or generating machine code directly, or anything else that doesn't have the overhead of spawning processes for compiling and linking? One of the goals listed is minimizing the JIT compilation time.

emn13 · on June 3, 2018

LLVM clearly wasn't designed for JIT. Don't let those letters "VM" confuse you; it's more like a machine abstraction than a virtual machine. And even that is far from water-tight.

But that doesn't mean you can't use a conventional compiler stack like LLVM as a JIT and get excellent code - it' just going to take its own sweet time doing so.

Can anyone think of any reasonably common stacks using LLVM as a JIT? There's mono, but that's a non-default mode; not sure if it's typically used. The python unladen-swallow experiment failed. Webkit had a short-lived FLT javascript optimization pass, but that was replaced by B3.

Which is just a long-winded way to suggest that LLVM is not likely to be ideal as a JIT, at least based on what past projects have done.

(Not trying to imply that writing C to disk is better, but it may well be simpler & more flexible - not worthless qualities for an initial implementation).

eslaught · on June 3, 2018

I use LLVM as a JIT via Terra [1]. It performs about as well as you'd expect any other C compiler to perform. That is, if you do a bad job of code generation and pass it a multi-MB file in a single function, well then of course it's going to choke. But if you're optimizing tight loops and have reasonable code generation, it's very good and you can get performance comparable to a best-in-class C compiler without the overhead and headache associated with calling out to an external program.

The main place where LLVM bites you is compatibility. There simply is none. This is a constaint drain on your resources and a lot of projects can't afford to keep up. There is even a project on LLVM's own home page which is was on 3.4 for a long time and has just recently upgraded to 3.8 [2].

But if the alternative is shelling out to a C compiler? I'll take LLVM any day. The issue is not just the overhead of a call to an external program, it's all the extra complexity that comes along with that. It is very, very easy for this approach to break, especially when you consider the breadth of C compilers that exist, and all the possible ways they can be configured. In contrast, LLVM is "just" a library that you link to.

[1]: http://terralang.org/

[2]: http://klee.llvm.org/

emn13 · on June 3, 2018

I'm a little skeptical about the costs complexity of an external program. You may not need to support all those C compilers, but at least you have the choice. And C is extremely mature and stable. If you're generating code, you probably don't need to use the latest not-so-well supported features; you may well be able to have C code that compiles on almost any compiler from the last 3 decades without too much trouble. And while there will be more configuration choices, it's not like raw LLVM has none.

If anything, I'd bet plain C is much simpler because it hasn't changed much, and is very unlikely to ever to anything very suprising on any future platform - which cannot be said of raw LLVM.

And of course shelling out is a a bit of a hassle, but hey; it's a well-trodden path on unix. It's not the fastest, greatest interop in the world, but it's good enough for a lot of things.

(and wow- terra sounds impressive!)

eslaught · on June 3, 2018

I agree with many of your points, in theory.

I'll just say that my views come mainly from experience, specifically ECL (Embeddable Common Lisp, a CL implementation) and (this was further back, so my memory is fuzzy) a tool for generating executables from Perl scripts. I don't think I'm using an especially unusual setup, or unusual compilers, and I would guess that these tools probably target a very narrow subset of C. Despite this, my experience with these sorts of tools has been anything but "works out of the box". On the contrary, there appear to be a great number of degrees of freedom, even with standard-ish setups, that can trip up these tools. Because of the additional layers of abstraction, the error messages you get are very poor. Some header file is missing or in an unexpected place, or worse some generated code fails to compile. As an end-user, it's basically impossible to debug these in a reasonable way.

You can certainly have internal errors using LLVM, but in my experience fewer of them are platform-dependent. Therefore there is a greater chance that something that works for the developer will work for the user. Also, if error handling is done properly, if a failure does occur it can often mapped back to the original source program. This is much better as far as usability goes, since the user almost never wants to debug some compiler's generated code.

anarazel · on June 3, 2018

> The main place where LLVM bites you is compatibility. There simply is none. This is a constaint drain on your resources and a lot of projects can't afford to keep up. There is even a project on LLVM's own home page which is was on 3.4 for a long time and has just recently upgraded to 3.8 [2].

Yea, it's annoying. For PostgreSQL I've decided to focus on the C API wherever possible exactly out of that reason. A bit more painful to write, but not even remotely as quickly moving. Obviously there's parts where that's not possible - but even there I've decided to localize that as much as possible.

eslaught · on June 3, 2018

I wonder if there will ever be a de facto API wrapper for LLVM. As it is, I'm aware of smaller efforts here and there, but other than SPIR-V [1] I'm not sure any are big enough to have long-term survivability potential. And even with SPIR-V I'm not sure if the momentum is really there or not.

[1]: https://www.khronos.org/registry/spir-v/specs/1.0/SPIRV.pdf

pjmlp · on June 4, 2018

I think Apple's bitcode for iOS deployment is also more stable than the actual LLVM bitcode.

anarazel · on June 3, 2018

> Can anyone think of any reasonably common stacks using LLVM as a JIT?

We just added LLVM based JIT to PostgreSQL. Don't think we have quite the same issues as JITing generic interpreted languages though, because the planner gives us much more information about the likely cost of executing a query. So the need for a super-fast baseline JIT isn't as big.

> But that doesn't mean you can't use a conventional compiler stack like LLVM as a JIT and get excellent code - it' just going to take its own sweet time doing so.

I think that's partially due to people using the expensive default pipeline when using optimization. A lot of those either don't make sense for the source language, or not for the first JIT foreground JIT pass.

The biggest issue I have with LLVM wrt around JITing is that it's error handling isn't really good enough. It's fine to just fatal error if you're in a AOT compiler world, but that's much less acceptable inside a database. There's moves to make at least parts of LLVM exception safe, but ...

twic · on June 3, 2018

> Can anyone think of any reasonably common stacks using LLVM as a JIT?

PostgreSQL - although i doubt that's the sort of thing you had in mind!

emn13 · on June 3, 2018

That's a great example! It's pretty much exactly what I was looking for (well, except that it's probably going to be niche, at least for a while?) Still - good example.

boulos · on June 3, 2018

Having used LLVM for precisely that for both Open Shading Language (OSL) and my startup’s runtime, LLVM’s JIT was pretty good. It’s certainly not optimized for “just throw everything at it function at a time and pray” like a more custom-built JIT (like in HHVM) or even nanojit. But it’s backend output is beyond compare, and you instantly get cross-platform compatibility. As a runtime implementer, it (was) phenomenal.

After LLVM 3.4 or so with the forcible move to “MCJIT” (now ORCJIT maybe?) it suddenly got even more painful though. While the Module system in LLVM was always abused by the JIT, it was a sad day for many of us who instead pinned to 3.4 for a while. I haven’t followed up in a while to see how the newer JITs have progressed, but I believe the last-layer JIT for Safari uses LLVM as well.

tl;dr: for the right time versus execution speed trade-off, LLVM is still awesome.

emn13 · on June 3, 2018

Safari's last-layer used LLVM until 2016; it's switched to a custom JIT since then (ref: https://webkit.org/blog/5852/introducing-the-b3-jit-compiler...).

Since you have some experience - do you think shelling out would have been much more painful?

boulos · on June 3, 2018

See, I’m out of date! :).

Shelling out (which I’ve also done) is okay, but you never get to really teach the backend what you know. That is, no matter how hard you try, you can’t teach gcc, icc, or clang that you know it’s safe to just fetch this function pointer off a struct and that it’s stable. Writing a simple pass in LLVM though is incredibly straightforward. You can even do a simple inliner, that knows how to inline just the runtime callsites you care about.

Like the WebKit folks and the HHVM folks before them: dynamic languages have enough complexity that you often get most of the win from a “basic compilation” (compared to say C/C++) so after you’ve proven out what you need, you roll your own.

Shelling out though would be strictly worse than the LLVM in-memory approach, since it gets you no additional benefit (in some ways it’s harder, since you can’t just say “jump to this address”), you lose a lot of upside (custom passes, letting you tune optimizations and instruction selection beyond simply -O0, -O1, etc.), and then you get to require users to have a compiler on their box.

I’d personally look at nanojit or the other JIT libraries before shelling out to a regular compiler.

obl · on June 3, 2018

yeah. even LLVM is very slow for a JIT. Ask any project using LLVM as a backend for a JIT and they'll tell you it's an recurring issue. See for example https://webkit.org/blog/5852/introducing-the-b3-jit-compiler...

I know very little about ruby specifically but IME for this kind of dynamic language you get most of the initial gains by :

- removing (by analysis or speculation) dynamic dispatch

- unboxing / avoiding allocations in the easy cases

Once you've done that, you can generate pretty dumb assembly and still come out way ahead of your interpreter (and avoid very costly optimization / instruction selection / regalloc / scheduling).

Most of what llvm / gcc do only make sense when you've got your code down close to whatever you would actually write in C.

pmontra · on June 3, 2018

They want to check the generated code and let people check it

> The main purpose of this JIT release is to provide a chance to check if it works for your platform and to find out security risks before the 2.6 release

ychen306 · on June 4, 2018

LLVM's JIT is in a sense different from that of, say, PyPy. It's more primitive. When people talk about JIT in the context of LLVM, they mean the set of APIs provided by LLVM's library. That is, give it a set of IR functions, and things they depend on, the library dynamically compiles and links them for you. More concretely, for example, given the IR of a function, it gives you a raw pointer to the compiled version that you can call directly. It takes care of the boring (and often platform dependent) parts efficiently -- code gen, linking, etc -- so that you can focus on generating efficient IR (which is the hard part for a JIT).

jashmatthews · on June 3, 2018

Ruby on PyPy already exists: https://github.com/topazproject/topaz

Performance is disappointing, though.

handbanana · on June 3, 2018

I don't see why this is bad. Many compilers generate C

kryptiskt · on June 3, 2018

Not at runtime.

atombender · on June 3, 2018

Varnish uses GCC to compile VCL into a dynamically loaded library. Not a general-purpose language, but it's done at runtime.

useranme · on June 3, 2018

MemSQL and Smalltalk generate C at runtime.

pjmlp · on June 3, 2018

If you mean Squeak, it surely does not.

Generating C is part of the bootstrapping process, it isn't used at runtime, the JIT generates the usual machine code directly.

floatboth · on June 3, 2018

Which Smalltalk?

dfox · on June 3, 2018

Smalltalk/X can fileout packages as C projects that are then compiled by C compiler. But AFAIK this was never meant to be used as JIT and is primarily an deployment mechanism and non-ancient versions use in-process code generator implemented in Smalltalk as JIT backend.

There are Common Lisp implementations that support similar mechanism of generating C code (ECL, Kyoto CL...), but I don't think any of then compiles C into .so which then gets dlopened right away as poor-mans JIT.

kazinator · on June 4, 2018

KCL generates .c files and compiles those to .o object files. I played with this year ago (via the descendant GCL: GNU Common Lisp). The load function handles object files, like COFF or whatever. It's reminiscent of the Linux kernel modules.

See here, starting on P. 36: http://www.softwarepreservation.org/projects/LISP/kcl/doc/kc...

When KCL compiles a lambda expression, it generates a C file called "gazonk.lsp" and compiles that.

(The above paper report is a little confusing; in some places it claims that an object file has a .o suffix, but then with regard to this gazonk implicit name, it claims that the fasl file is gazonk.fasl.)

lispm · on June 4, 2018

Example with GCL: compile individual function to C, compile it with C to a .o (for example on my 32bit ARM it is a elf32-littlearm file) file and then load it:

    >(defun foo (a) (* a 42)) 

    FOO

    >(compile 'foo)

    Compiling /tmp/gazonk_24158_0.lsp.
    End of Pass 1.  
    End of Pass 2.  
    OPTIMIZE levels: Safety=0 (No runtime error checking), Space=0, Speed=3
    Finished compiling /tmp/gazonk_24158_0.lsp.
    Loading /tmp/gazonk_24158_0.o
    start address -T 0x888488 Finished loading /tmp/gazonk_24158_0.o
    #<compiled-function FOO>
    NIL
    NIL

dozzie · on June 3, 2018

It's the traditional method of doing a job equivalent to JIT. It has several examples in the history of computing, like SpamAssassin and Matlab (the latter if my memory serves me correctly).

my123 · on June 3, 2018

Same as the last stage of the SQL Server query optimizer then. (hey MS)

ethagnawl · on June 3, 2018

> Oh dear god.

Care to elaborate?

ychen306 · on June 4, 2018

Overhead. It converts its IR to C; dumps that to disk; the C compiler loads the code back from the disk; the frontend parses the code (if not done carefully maybe CPP is also invoked); the compiler dumps the generated code to disk again; and then presumably dlopen loads the code back from disk again. There's also the overhead of spawning a separate compiler process. A better way would be to directly generating code to memory and link it. This is of course trickier, but is also what libraries such as LLVM's JIT infrastructure and libjit are built for. If you need more performance (i.e. LLVM's JIT is too slow for you), you roll your own infrastructure to do this -- which is what JVM and V8 do.

dwheeler · on June 4, 2018

There's definitely a run-time overhead, but it may not be that bad in practice. Details on the Ruby JIT implementation are here: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch#mjit-...

They don't "dump to disk", if you mean an actual storage device. By default they store data to a "file system in memory" (a tmpfs), so it never gets written to a long-term storage device (not even an SSD). Even if you do "dump to disk", on a modern OS storing things in a file just puts it in memory and schedules it for eventual long-term storage. Of course, doing things this way has overheads, but it may not be so bad.

The C frontend has to parse things, of course, but it looks like they're heavily optimizing this. "To simplify JIT implementation the environment (C code header needed to C code generated by MJIT) is just an vm.c file. A special Ruby script minimize the environment (Removing about 90% of the declarations). One worker prepares a precompiled code of the minimized header, which starts at the MRI execution start".

Their current results are that "No Ruby program real time execution slow down because of MJIT" and "The compilation of small ISEQ takes about 50-70 ms on modern x86-64 CPUs". You're of course using more CPU (to do the compilations in parallel), and you have to have a compilation suite available at runtime, but in many circumstances that is perfectly reasonable.

IIRC, the gcc C compiler doesn't generate machine code itself either; it generates assembly code, which is then farmed out to a separate assembly process (using using GNU assembler aka GAS). Farming out compilation work to other processes is not new.

It seems to me that this is a really plausible trade. This approach means that they can add a just-in-time compiler "relatively" quickly, and one that should produce pretty decent code once they add some actual optimizations (because it's building on very mature C compilers). The trade-off is that this approach requires more run-time CPU and time to create each compiled component (what you term as overhead). For many systems, this is probably an appropriate trade. As I posted earlier, I'm very interested in seeing how well this works - I think it's promising.

saagarjha · on June 3, 2018

There are presumably better ways to get assembly out of code than generating C and passing it through a compiler frontend.

dwheeler · on June 3, 2018

That is an unproven assumption.

It's faster to hand-generate machine code straight from an interpreter than to invoke a C compiler. But that is not the only issue. As with everything else, this is a trade-off, and I'm eager to see how it works out. I can see some positive reasons to do this:

1. The Ruby developers get highly-optimized machine code, with relatively little effort on their part. Many, many man-years have been spent to make C compilers generate highly optimal code.

2. The C language, as an interface, is extremely stable, so once it works it should just keep working. Compare that to the constantly-changing interfaces of many alternatives.

3. Debugging is WAY easier. If there's a problem in generated code, it's way easier to read intermediate C code (especially after going through a pretty-printer) than many other kinds of intermediate formats, and millions of people already know it.

In short, this approach means that they can very rapidly produce a system that can run tight loops very quickly, one that resists interface instability (so the approach should keep working), and one that's easy to debug (so it should be reliable). For many applications, the fact that it takes a little more time to do the compilation may be unimportant, especially since that work is embarrassingly parallelizable.

I'm very interested in seeing how this plays out. If this works well for Ruby, I suspect some other language implementations will start considering using this approach. I'm sure it's not the best approach in all circumstances, but it might work very well for Ruby - and maybe for some other languages like it.

"If it works, it isn't stupid".

saagarjha · on June 4, 2018

> The Ruby developers get highly-optimized machine code, with relatively little effort on their part. Many, many man-years have been spent to make C compilers generate highly optimal code.

Not for machine generated code. C compilers work well on human generated code, and not as well as Ruby -> C "translations".

dwheeler · on June 4, 2018

> Not for machine generated code. C compilers work well on human generated code, and not as well as Ruby -> C "translations".

That depends on the machine generated code. C compilers are optimized for whatever the C compiler authors perceive as a common construct. If the generated C code uses constructs similar to what humans do, it's often quite good. If not, you can change the code that generates C, or in some cases you can convince the C compiler authors to optimize that situation as well.

pavanky · on June 3, 2018

Why aren't they using LLVM :-/

bastawhiz · on June 3, 2018

They answer this on the github:

> Unstable interfaces. An LLVM JIT is already used by Rubicon. A lot of efforts in preparation of code used by RTL insns (an environment)

https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch#a-few...

Twirrim · on June 3, 2018

llvm isn't always available, doesn't support as many architectures and doesn't always give the best performance.

The *nix philosophy has long been towards trying to provide choice wherever possible, so that people can use the tool that best meets their needs.

jashmatthews · on June 3, 2018

The new Ruby method JIT can use either GCC or clang as the backend. It uses C as an intermediate representation.

make3 · on June 3, 2018

that's also what Theano did

karag · on June 3, 2018

why ruby innovate and not python ?

blattimwind · on June 3, 2018

Cython has/had a mode of operation where it would attempt to compile any module thats imported at runtime, which sound fairly similar to this.

_ZeD_ · on June 3, 2018

why not both? and btw, you may look at pypy if you like this kind of innovations

geofft · on June 3, 2018

In fact, one of the oldest JITted Ruby implementations is Topaz, which was built on top of PyPy (i.e., it's a Python program that uses the PyPy infrastructure to parse/JIT/run Ruby instead of Python).

sametmax · on June 3, 2018

Pyscho, Pypy, pyjion...

jaequery · on June 3, 2018

I wonder if Matz may consider even just adopting Crystal in the future, considering it is almost 99% compatible with Ruby and has 10x-20x performance gains out of the box.

jashmatthews · on June 3, 2018

Crystal is far less compatible with Ruby than that. Simple methods/classes are valid but everything after that is completely different.

TeddyDD · on June 3, 2018

This. Some concepts and syntax are similar but they are quite different. I'm not saying that in negative sense - static typing with type inference rocks.

chrisseaton · on June 3, 2018

> I wonder if Matz may consider even just adopting Crystal in the future, considering it is almost 99% compatible with Ruby

This isn't even remotely true. Crystal syntax looks like Ruby. Crystal's semantics (the bit that matters) are not like Ruby.

stevebmark · on June 3, 2018

Everything in Ruby happens at runtime. Even the definition `class X` becomes a runtime `Class.new` invocation. It's imperative from the inside out. Even a JIT is compiler won't and can't solve the fundamental flaws permanently baked into the language. If you want performance, Ruby probably shouldn't be the first tool you reach for.

jashmatthews · on June 3, 2018

TruffleRuby + Graal brings full Java equivalent JVM performance to Ruby. It can even AOT compile a class definition like you described as impossible by using Partial Evaluation.

Oracle's plan for world domination via JVM is completely changing the performance landscape for dynamic languages.

stouset · on June 3, 2018

This isn’t a flaw. It’s a tradeoff. And one that buys an incredibly useful amount of flexibility.

stevebmark · on June 3, 2018

It's a "tradeoff" in the way that not writing tests is a tradeoff

sametmax · on June 3, 2018

Sure. Let's write a script, a website or analyse bio data together. You pick a static language. I'll pick a dynamic one. See you tomorrow.

stevebmark · on June 3, 2018

While my main concerns about Ruby aren't directly related to type systems or static typing, at this point in my career, the speed at which I can hack out a script in a language in 24 hours isn't related to how well designed I think the language is.