Congratulations Ruby team! I'm excited to hear about the performance improvements. It seems like JIT could be a huge win since calling a method requires checking all kinds of possibilities that are almost always not used, but might be. I would love to keep all the power & fun of Ruby without worrying so much about performance.
Speaking of optimizing method calls: now that it's been a few years, I wonder what Ruby folks think about refinements. Are you using them? Are they helpful? Horrible?
I remember reading from the JRuby folks that refinements would make Ruby method calls slower---and not just refined calls, but all method calls [1], although it sounds like that changed some before they were released [2]. It seems like people stopped talking about this after they came out, so I'm wondering if refinements are still a challenge when optimizing Ruby? I guess MRI JIT will face the same challenges as the Java implementation?
As the creator of Ruby Facets, refinements were high on the todo list. Unfortunately the syntax of refinements led to an ugly problem... I would have to support two identical code bases to support both usages (monkey patching and refinement), the only difference being boilerplate code. I could find no way around it. That seemed ludicrous to me, and so I never bothered to add refinement support. Hence, from my perspective, refinements went over like a led balloon.
That is great to hear! I'd love to read the details if you happen to have a link to more information. I've tried Googling for it a few times over the years but never found anything.
I haven't done any work to implement refinements myself, but I do know that basically it came down to something that you could write an inline cache against, guarded against the class version, so a simple word guard that you were doing anyway when you called any method.
> Add a new alias then to Kernel#yield_self. [Feature #14594]
It might seem strange that they lead with this new feature, but `yield_self` can greatly improve the Ruby chainsaw, and `then` makes it more accessible.
The style of writing a Ruby method as series of chained statements has a non-trivial effect on readability and conciseness. `then` lets you stick an arbitrary function anywhere in the chain. They become more flexible and composable, with less need to interrupt them with intermediate variables that break the flow.
I've been using `ergo` from Ruby Facets for years ... largely the same thing ... and the more I used it the more readable I find my old code now. Funny how adding one very simple method can have more effect than so many other complex, high effort changes.
> We’re going to implement method iniling in JIT compiler, which is expected to increase Ruby’s performance in order of magnitude
An order of magnitude as in .. 10x? This seems too good to be true. Half the arguments against Rails melt away like butter if that's truly the case.
Anyone with a better understanding of the details care to comment on the likelihood of these performance gains being actually realised, and if not, what we might realistically expect?
Sinatra + Sequel is already very competitive in web performance with Go + Gin[1]. It's the Rails convenience stuff which slows things down massively. MJIT could probably bring Rails in line with Sinatra though.
Between Ruby 1.8 and 2.5, performance has improved around 13x in tight loops[2]. The Rails performance issue has been massively overblown since 1.9 was released.
Ruby 1.8 was a tree walking interpreter, so the move to a bytecode VM in 1.9 was a huge leap in performance. Twitter bailed to the JVM before moving to 1.9. A lot of those 10-100x performance differences to the JVM are gone thanks to the bytecode VM and generational GC.
Bytecode VMs all have the same fundamental problem of instruction dispatch overhead, they're basically executing different C functions depending on input.
Doing _anything_ to reduce this improves performance dramatically, even just spitting out the instruction source code into a giant C function, compiling it, and calling that in place of the original method. Another 10x improvement on tight loops should not be a problem.
I would have never thought a ruby stack would come anywhere close to Go performance. The path to optimization used to mean abstracting the really crazy parts into a Go microservice for things that just needed absurd responsivity; but it's clear now that a slim ruby stack could also be very effective - and without needing to learn a new language. Worthwhile to at least explore before going Go.
Nor did i know that twitter jumped out of rails before ruby got performant. Which means the argument that twitter outgrew rails isn't so correct anymore.
>Nor did i know that twitter jumped out of rails before ruby got performant. Which means the argument that twitter outgrew rails isn't so correct anymore.
Twitter, even back in those days would have still outgrown today's Rails. It was Ruby that has gotten a lot faster. Not necessarily Rails.
From what I understand, the biggest issue was their product required fast fan out messaging. Tumblr, for example, is still huge but can get away with 1000 lines of PHP for their feed: https://news.ycombinator.com/item?id=17154403
ActiveSupport also monkey-patches to_json with a recursive Ruby function, completely nerfing JSON performance by 50x in a lot of cases. Unfortunately, it's what makes 'render json: @model' just... work.
Any time I personally ever want to generate JSON in Ruby, I want recursive tree traversal. It's just too painful to expect primitive types on everything. You'd have to pay that 50x doing it explicitly with a recursive function before you turn it into JSON anyway.
Ideally, you'd serialize directly from the database, bypassing the application entirely. Easily doable in ActiveRecord, but it's an explicit action, not the default. Not even sure if it's available in other databases besides PostgreSQL.
Have you ever used Ruby's JSON.dump? The second it runs into anything that it can't convert, you get "#<Object:0x00007fab05133078>" everywhere in your output.
Thanks for the informative response (and sudhirj too)! Haven't looked at that benchmark for a while - very interesting. Sinatra is absolutely killing it. I knew it was faster than Rails but not that much faster.
With 2.6 and sorbet [1] coming down the line, it's exciting to be a Rubyist again!
>Doing _anything_ to reduce this improves performance dramatically
It does if you ignore the overhead of JIT compilation itself. However, my understanding is that writing a JIT implementation that performs better than a good interpreter is surprisingly difficult. You have to have a lot of complicated logic for tracking hotspots and using JIT judiciously in short-running scripts.
Hmmm, not really. I think that's somewhat true for JS and webpages, but they're not the same as server-side apps.
It's the instruction dispatch overhead that's the real unavoidable problem. LuaJIT, for example, uses a bunch of tricks to minimize it in the bytecode VM, and it's significantly faster than the standard Lua VM but still far, far slower than basic JIT compilation.
Right, but historically there are lots of instances of projects that abandoned JITs because they didn't get a performance improvement. JIT compilation reduces instruction dispatch overhead, but it also, unless accompanied by sophisticated profiling techniques, adds the overhead of JIT compilation time, which can easily swamp the improvements.
Lua JIT is one of the most sophisticated dynamic language JITs out there, so it's hardly evidence that a simple implementation of a JIT will perform better than a good bytecode interpreter.
The problem is less acute for server side apps because the programs run for a long time, so that the initial compilation overhead is insignificant. However, there's a reason that you need a JIT to make Ruby fast rather than an ahead of time compiler. Ruby has so few compile-time guarantees that you need to do a lot of dynamic specialization to get really significant performance improvements. So compilation might still be triggered even after a script has been running for a long time.
I'd add that PyPy, which is also very sophisticated, is often not much faster than CPython, and in fact is slower for some types of code. Writing good JIT-based implementations for dynamic languages is really a tough problem. See e.g. the following post for some explanation of why:
> Right, but historically there are lots of instances of projects that abandoned JITs because they didn't get a performance improvement. JIT compilation reduces instruction dispatch overhead, but it also, unless accompanied by sophisticated profiling techniques, adds the overhead of JIT compilation time, which can easily swamp the improvements.
Yes.
> Lua JIT is one of the most sophisticated dynamic language JITs out there, so it's hardly evidence that a simple implementation of a JIT will perform better than a good bytecode interpreter.
I meant that even a basic JIT can offer the same speedup as LuaJIT's interpreter, and a lot more work went into the latter.
> The problem is less acute for server side apps because the programs run for a long time, so that the initial compilation overhead is insignificant. However, there's a reason that you need a JIT to make Ruby fast rather than an ahead of time compiler. Ruby has so few compile-time guarantees that you need to do a lot of dynamic specialization to get really significant performance improvements. So compilation might still be triggered even after a script has been running for a long time.
The initial results of MJIT for simply removing the instruction dispatch overhead and doing some basic optimizations are a 30-230% performance increase on a small but real-world benchmark. No type specialization and specular optimization required.
> I'd add that PyPy, which is also very sophisticated, is often not much faster than CPython, and in fact is slower for some types of code. Writing good JIT-based implementations for dynamic languages is really a tough problem. See e.g. the following post for some explanation of why:
Most of the discussion about PyPy is completely irrelevant for the discussion about MJIT. PyPy isn't a method JIT. PyPy traces the interpreter itself and tries to produce a specialized interpreter. It works even worse at optimizing Ruby code via Topaz.
True! I just checked again and Topaz is indeed almost twice as fast as CRuby on optcarrot. I think I got it mixed up with the non-JIT Rubinius numbers.
>The initial results of MJIT for simply removing the instruction dispatch overhead and doing some basic optimizations are a 30-230% performance increase on a small but real-world benchmark. No type specialization and specular optimization required
So, this amounts to a small improvement for some types of code. Indeed, it is "easy" to get that by "just" using some basic JIT techniques. The trick is to get consistently better performance across the board. Relevant tweet at https://medium.com/@k0kubun/the-method-jit-compiler-for-ruby...:
>I've just committed the initial JIT compiler for Ruby. It's not still so fast yet (especially it's performing badly with Rails for now), but we have much time to improve it until Ruby 2.6 (or 3.0) release.
> The trick is to get consistently better performance across the board.
This will come with the rest of the opimizations Takashi has planned for Ruby 2.6. Ruby-Ruby method inlining, which is almost finished, is a huge one for improving Rails performance. IMHO there's no real point talking about Rails until it's working in some form.
> >I've just committed the initial JIT compiler for Ruby. It's not still so fast yet (especially it's performing badly with Rails for now), but we have much time to improve it until Ruby 2.6 (or 3.0) release.
I'm not saying that MJIT (or whatever implementation) will never be fast. I'm just saying that in general, it is not trivial to get performance improvements by writing a JIT. My original comment said nothing about Ruby.
This is simply not true. You're massively underestimating the overhead of a bytecode VM. Even the most optimal bytecode VMs are easily beaten by the simplest of JITs in 100 lines of C: https://arxiv.org/pdf/1604.01290.pdf
Code doesn't even need to be "hot" to make it worth it. WebKit switches from interpreter to cheap baseline compilation, without an specular optimizations or type information, after only 6 calls of a function: https://webkit.org/blog/3362/introducing-the-webkit-ftl-jit/
This article literally describes how baseline JIT is worth it simply to remove the bytecode VM dispatch overhead.
It's not clear to me why you think that the linked article shows that naive JITs perform better across-the-board than well-written bytecode interpreters. The reported speed improvement from JIT itself is a mere 2.3x, and the listed benchmarks mostly involve numerical code, where JIT tends to be effective.
A simple JIT can get you to the point where you reliably outperform a bytecode interpreter for certain types of code. What takes a lot more engineering effort is reliably performing at least as fast as a bytecode VM for all types of code.
MJIT is completely different to PyPy. Their problems are simply not relevant to MJIT. MJIT is already just as fast as standard CRuby when executing complex Rails code, minus the small overhead for JIT compilation: http://engineering.appfolio.com/appfolio-engineering/2018/3/...
PyPy is a much more ambitious design, completely replacing CPython, and using an unusual JIT scheme of tracing the interpreter itself and trying to produce an interpreter optimized for particular traces of your code.
It was much harder for that approach to reach the same level of general performance than it seems to have been for CRuby & MJIT.
Yes, I agree with this. We use Sinatra + Sequel, but run our code using JRuby mostly because we're sharing some scala libs with other teams. In any case for us, performance has not been a problem (~20k req/min with one node). I'm really looking forward to ruby 3x3 :)
JRuby is really hitting its stride now, becoming ~3x faster than CRuby in my testing. The JVM changes to support more dynamic languages have improved performance so much.
Charles Nutter's early tests using JRuby on the GraalVM sound like there's another big step in performance coming without a huge amount of work.
CRuby's GIL doesn't really matter for serving web requests since it's run with one process per core like NodeJS is. It's less memory efficient but doesn't really affect throughput so much. Also, JRuby has no GIL.
Is Idris fully featured? It's better to compare two frameworks that have real production use in more complex apps to get an idea. Both Gin and Sinatra fit this description.
I haven't had a chance to use Bootsnap yet but it sounds really promising.
Ruby has made huge strides to be sure. However this is a bit hyperbolic IMHO.
Golang plus gin, sure. However there are other Go frameworks on the charts that blast the Ruby competition out of the water. Ruby isn't really on the podium at all with C, C++, Rust, Golang, C#, and Java about an order of magnitude out in the lead on fortunes.
Martini isn't much of a framework itself either, so lets forget the full featured nonsense. Almost none of the ecosystem is in play with these benchmarks. You could build a system up around fasthttp just as well as net/http, and ASP.NET certainly can't be accused of being a for-purpose contender.
The most impressive thing IMHO is how well Ruby is doing on maximum latency. I can't quite reconcile that considering fasthttp is pretty much zero-allocation and golangs stop the world is in the microseconds.. Pretty impressive.
> The most impressive thing IMHO is how well Ruby is doing on maximum latency. I can't quite reconcile that considering fasthttp is pretty much zero-allocation and golangs stop the world is in the microseconds.. Pretty impressive.
Fast GC is critical to Ruby performance so a ton work went into it. Ruby 2.2+ has a very short STW phase thanks to generational GC + incremental marking.
What I remember reading from the Graal folks a while back was that Rails performance issues revolves around the amount of object creation and destruction.
For hot code, probably, not the entire language in general. If you're looping through a million numbers doing the same calculation, or maybe rendering markdown in a loop an a lot of text, might hit 10X - the JIT will essentially write the code in C for you, then compile it and run it instead of Ruby.
Another factor is a lot of the hot code, like JSON serialization, DB clients, HTTP parsing, is all already in native C extensions. Performance there is only going to improve a little, although some can be better off as pure Ruby + JIT.
Rubinius uses LLVM IR for this. The only problem is rbx is still very slow on startup and interactive use.
Ruby (MRI) will have to reinvent the wheel in order to get a panoply of optimizations that some very smart people have already baked in: like the ability to target almost any platform from the same library.. GCC requires cross-compiling per target.
Just spun up my ridiculously heavy rails app and did a quick test with one of the more cpu-intensive actions and see no evidence of any improvement at all. If anything, it was a bit less predictable.
This little test was nowhere near an actual benchmark, but I wouldn't hold my breath.
Test suite still passes on it though, so upgrading shouldn't be a huge deal at least へ‿(ツ)‿ㄏ
@darkdimius[0], the person who tweeted the referenced tweet in the article, was arguably the primary contributor to Dotty[1] behind Martin Odersky (Dotty will be Scala 3, the next major version of the language).
Once he finished his doctorate at the EPFL, off to Stripe he went, bye bye Scala. Tough industry, on the one hand Scala benefits from a revolving door of high level EPFL doctoral students, and on the other the talent pool shifts around as students come and go.
Money talks, companies like Stripe have a leg up in that they can fund full-time engineers to work on projects, whereas institution backed projects typically have a much smaller pool of long-term engineers to rely on (JetBrains, for example, has something like 40 full-time engineers working on Kotlin/KotlinJS/Kotlin Native).
Where you're seeing Oracle tech I see free software licensed under the LGPL. There's no way that if TruffleRuby becomes well and truly popular and Oracle decides to... be Oracle, no other company would pick up the banner. Ruby is just too popular for that not to happen.
Because Oracle is Oracle, the most evil company in tech, the one most blatantly greedy. Look at what they pulled with Google. Oracle would wait till the tech usage grows and then use patents and API copyrights or whatever else they invent out of thin air to go after the players using its tech. The free license of Graal does not protect you from that, GPL2 specifically not. See https://www.gnu.org/licenses/rms-why-gplv3.en.html.
Google abused Sun, took advantage that they were in a critical financial situation not able to sue, and when they crashed, did not move a finger to rescue the company assets.
Now Android has Google's own J++, limiting what kind of Java libraries are portable to the platform.
At the same time, some OEMs are adopting Android instead of Embedded Java, thus increasing the fragmentation about what Java libraries are actually portable.
Google just though they could let Sun close doors and get away with how they created their own J++.
Also doesn't change the fact that even with Android 8.1, I as Java developer cannot take a random jar from Maven Central and be certain it won't crash and burn on Android, regardless of the version.
But I don't see how you can tolerate that contradiction. Either you agree with Oracle that the Java APIs were copyrighted and Google should not have been allowed to reconstruct them. Or you worry about fragmentation coming from an incompatible Java implementation. Doing both is nonsensical.
How do you come to the idea that having more Java devices available, even if not 100% compatible, would have in any way caused harm to Sun? And then even that much that it killed the company?
On top of that, I don't see for what google would have had an obligation to pay.
Contradiction: Google broke some imaginary copyright by re-implementing APIs, but Google is bad because the re-implementation was not 100% equal to the original causing fragmentation. Either the fragmentation was harmful, then the API copyright was the problem. Or the API copyright violation was the problem, then fragmentation was the explicit goal and Google's try to minimize it the problem. Both can't be true at the same time outside lawyer lala land.
Because those devices run Android Java, which Sun saw $0, thus not able to capitalize on it to pay their bills.
1 - Google did not pay for Java licenses, when it should. Even Andy Rubin admits that on his emails.
2 - To this day Android is not Java SE compliant, thus creating a fragmentation between Android Java and Java. Just like Sun managed to prevent with J++
3 - Being a Java license as Google avoided to be, and still isn't (many Java APIs are not yet available on Android), would have required Android to be fully Java SE compliant
So to conclude, Google tricked Sun and fragmented the Java eco-system.
They should pay and provide a 100% Java SE compliant implementation, or be honest about it and fully migrate to Kotlin, Dart or whatever they feel like it.
FYI, what you've linked to is a breakdown for the licenses of various projects comprising GraalVM. The licenses you listed only apply apply to TruffleRuby.
Having said that, Graal and its related projects are all open source, with a license listing available in its README:
I wish though, that except for eval and exec, that the majority of Ruby code could get an incremental/LTO JIT similar to HotSpot and a GC like C4 rather than an inc gen GC.
> Unlike ordinary JIT compilers for other languages, Ruby’s JIT compiler does JIT compilation in a unique way, which prints C code to a disk and spawns common C compiler process to generate native cod e.
What's the advantage over using LLVM's built-in JIT, or PyPy's JIT, or generating machine code directly, or anything else that doesn't have the overhead of spawning processes for compiling and linking? One of the goals listed is minimizing the JIT compilation time.
LLVM clearly wasn't designed for JIT. Don't let those letters "VM" confuse you; it's more like a machine abstraction than a virtual machine. And even that is far from water-tight.
But that doesn't mean you can't use a conventional compiler stack like LLVM as a JIT and get excellent code - it' just going to take its own sweet time doing so.
Can anyone think of any reasonably common stacks using LLVM as a JIT? There's mono, but that's a non-default mode; not sure if it's typically used. The python unladen-swallow experiment failed. Webkit had a short-lived FLT javascript optimization pass, but that was replaced by B3.
Which is just a long-winded way to suggest that LLVM is not likely to be ideal as a JIT, at least based on what past projects have done.
(Not trying to imply that writing C to disk is better, but it may well be simpler & more flexible - not worthless qualities for an initial implementation).
I use LLVM as a JIT via Terra [1]. It performs about as well as you'd expect any other C compiler to perform. That is, if you do a bad job of code generation and pass it a multi-MB file in a single function, well then of course it's going to choke. But if you're optimizing tight loops and have reasonable code generation, it's very good and you can get performance comparable to a best-in-class C compiler without the overhead and headache associated with calling out to an external program.
The main place where LLVM bites you is compatibility. There simply is none. This is a constaint drain on your resources and a lot of projects can't afford to keep up. There is even a project on LLVM's own home page which is was on 3.4 for a long time and has just recently upgraded to 3.8 [2].
But if the alternative is shelling out to a C compiler? I'll take LLVM any day. The issue is not just the overhead of a call to an external program, it's all the extra complexity that comes along with that. It is very, very easy for this approach to break, especially when you consider the breadth of C compilers that exist, and all the possible ways they can be configured. In contrast, LLVM is "just" a library that you link to.
I'm a little skeptical about the costs complexity of an external program. You may not need to support all those C compilers, but at least you have the choice. And C is extremely mature and stable. If you're generating code, you probably don't need to use the latest not-so-well supported features; you may well be able to have C code that compiles on almost any compiler from the last 3 decades without too much trouble. And while there will be more configuration choices, it's not like raw LLVM has none.
If anything, I'd bet plain C is much simpler because it hasn't changed much, and is very unlikely to ever to anything very suprising on any future platform - which cannot be said of raw LLVM.
And of course shelling out is a a bit of a hassle, but hey; it's a well-trodden path on unix. It's not the fastest, greatest interop in the world, but it's good enough for a lot of things.
I'll just say that my views come mainly from experience, specifically ECL (Embeddable Common Lisp, a CL implementation) and (this was further back, so my memory is fuzzy) a tool for generating executables from Perl scripts. I don't think I'm using an especially unusual setup, or unusual compilers, and I would guess that these tools probably target a very narrow subset of C. Despite this, my experience with these sorts of tools has been anything but "works out of the box". On the contrary, there appear to be a great number of degrees of freedom, even with standard-ish setups, that can trip up these tools. Because of the additional layers of abstraction, the error messages you get are very poor. Some header file is missing or in an unexpected place, or worse some generated code fails to compile. As an end-user, it's basically impossible to debug these in a reasonable way.
You can certainly have internal errors using LLVM, but in my experience fewer of them are platform-dependent. Therefore there is a greater chance that something that works for the developer will work for the user. Also, if error handling is done properly, if a failure does occur it can often mapped back to the original source program. This is much better as far as usability goes, since the user almost never wants to debug some compiler's generated code.
> The main place where LLVM bites you is compatibility. There simply is none. This is a constaint drain on your resources and a lot of projects can't afford to keep up. There is even a project on LLVM's own home page which is was on 3.4 for a long time and has just recently upgraded to 3.8 [2].
Yea, it's annoying. For PostgreSQL I've decided to focus on the C API wherever possible exactly out of that reason. A bit more painful to write, but not even remotely as quickly moving. Obviously there's parts where that's not possible - but even there I've decided to localize that as much as possible.
I wonder if there will ever be a de facto API wrapper for LLVM. As it is, I'm aware of smaller efforts here and there, but other than SPIR-V [1] I'm not sure any are big enough to have long-term survivability potential. And even with SPIR-V I'm not sure if the momentum is really there or not.
> Can anyone think of any reasonably common stacks using LLVM as a JIT?
We just added LLVM based JIT to PostgreSQL. Don't think we have quite the same issues as JITing generic interpreted languages though, because the planner gives us much more information about the likely cost of executing a query. So the need for a super-fast baseline JIT isn't as big.
> But that doesn't mean you can't use a conventional compiler stack like LLVM as a JIT and get excellent code - it' just going to take its own sweet time doing so.
I think that's partially due to people using the expensive default pipeline when using optimization. A lot of those either don't make sense for the source language, or not for the first JIT foreground JIT pass.
The biggest issue I have with LLVM wrt around JITing is that it's error handling isn't really good enough. It's fine to just fatal error if you're in a AOT compiler world, but that's much less acceptable inside a database. There's moves to make at least parts of LLVM exception safe, but ...
That's a great example! It's pretty much exactly what I was looking for (well, except that it's probably going to be niche, at least for a while?) Still - good example.
Having used LLVM for precisely that for both Open Shading Language (OSL) and my startup’s runtime, LLVM’s JIT was pretty good. It’s certainly not optimized for “just throw everything at it function at a time and pray” like a more custom-built JIT (like in HHVM) or even nanojit. But it’s backend output is beyond compare, and you instantly get cross-platform compatibility. As a runtime implementer, it (was) phenomenal.
After LLVM 3.4 or so with the forcible move to “MCJIT” (now ORCJIT maybe?) it suddenly got even more painful though. While the Module system in LLVM was always abused by the JIT, it was a sad day for many of us who instead pinned to 3.4 for a while. I haven’t followed up in a while to see how the newer JITs have progressed, but I believe the last-layer JIT for Safari uses LLVM as well.
tl;dr: for the right time versus execution speed trade-off, LLVM is still awesome.
Shelling out (which I’ve also done) is okay, but you never get to really teach the backend what you know. That is, no matter how hard you try, you can’t teach gcc, icc, or clang that you know it’s safe to just fetch this function pointer off a struct and that it’s stable. Writing a simple pass in LLVM though is incredibly straightforward. You can even do a simple inliner, that knows how to inline just the runtime callsites you care about.
Like the WebKit folks and the HHVM folks before them: dynamic languages have enough complexity that you often get most of the win from a “basic compilation” (compared to say C/C++) so after you’ve proven out what you need, you roll your own.
Shelling out though would be strictly worse than the LLVM in-memory approach, since it gets you no additional benefit (in some ways it’s harder, since you can’t just say “jump to this address”), you lose a lot of upside (custom passes, letting you tune optimizations and instruction selection beyond simply -O0, -O1, etc.), and then you get to require users to have a compiler on their box.
I’d personally look at nanojit or the other JIT libraries before shelling out to a regular compiler.
I know very little about ruby specifically but IME for this kind of dynamic language you get most of the initial gains by :
- removing (by analysis or speculation) dynamic dispatch
- unboxing / avoiding allocations in the easy cases
Once you've done that, you can generate pretty dumb assembly and still come out way ahead of your interpreter (and avoid very costly optimization / instruction selection / regalloc / scheduling).
Most of what llvm / gcc do only make sense when you've got your code down close to whatever you would actually write in C.
They want to check the generated code and let people check it
> The main purpose of this JIT release is to provide a chance to check if it works for your platform and to find out security risks before the 2.6 release
LLVM's JIT is in a sense different from that of, say, PyPy. It's more primitive. When people talk about JIT in the context of LLVM, they mean the set of APIs provided by LLVM's library. That is, give it a set of IR functions, and things they depend on, the library dynamically compiles and links them for you. More concretely, for example, given the IR of a function, it gives you a raw pointer to the compiled version that you can call directly. It takes care of the boring (and often platform dependent) parts efficiently -- code gen, linking, etc -- so that you can focus on generating efficient IR (which is the hard part for a JIT).
Smalltalk/X can fileout packages as C projects that are then compiled by C compiler. But AFAIK this was never meant to be used as JIT and is primarily an deployment mechanism and non-ancient versions use in-process code generator implemented in Smalltalk as JIT backend.
There are Common Lisp implementations that support similar mechanism of generating C code (ECL, Kyoto CL...), but I don't think any of then compiles C into .so which then gets dlopened right away as poor-mans JIT.
KCL generates .c files and compiles those to .o object files. I played with this year ago (via the descendant GCL: GNU Common Lisp). The load function handles object files, like COFF or whatever. It's reminiscent of the Linux kernel modules.
When KCL compiles a lambda expression, it generates a C file called "gazonk.lsp" and compiles that.
(The above paper report is a little confusing; in some places it claims that an object file has a .o suffix, but then with regard to this gazonk implicit name, it claims that the fasl file is gazonk.fasl.)
Example with GCL: compile individual function to C, compile it with C to a .o (for example on my 32bit ARM it is a elf32-littlearm file) file and then load it:
>(defun foo (a) (* a 42))
FOO
>(compile 'foo)
Compiling /tmp/gazonk_24158_0.lsp.
End of Pass 1.
End of Pass 2.
OPTIMIZE levels: Safety=0 (No runtime error checking), Space=0, Speed=3
Finished compiling /tmp/gazonk_24158_0.lsp.
Loading /tmp/gazonk_24158_0.o
start address -T 0x888488 Finished loading /tmp/gazonk_24158_0.o
#<compiled-function FOO>
NIL
NIL
It's the traditional method of doing a job equivalent to JIT. It has several
examples in the history of computing, like SpamAssassin and Matlab (the latter
if my memory serves me correctly).
Overhead. It converts its IR to C; dumps that to disk; the C compiler loads the code back from the disk; the frontend parses the code (if not done carefully maybe CPP is also invoked); the compiler dumps the generated code to disk again; and then presumably dlopen loads the code back from disk again. There's also the overhead of spawning a separate compiler process. A better way would be to directly generating code to memory and link it. This is of course trickier, but is also what libraries such as LLVM's JIT infrastructure and libjit are built for. If you need more performance (i.e. LLVM's JIT is too slow for you), you roll your own infrastructure to do this -- which is what JVM and V8 do.
They don't "dump to disk", if you mean an actual storage device. By default they store data to a "file system in memory" (a tmpfs), so it never gets written to a long-term storage device (not even an SSD). Even if you do "dump to disk", on a modern OS storing things in a file just puts it in memory and schedules it for eventual long-term storage. Of course, doing things this way has overheads, but it may not be so bad.
The C frontend has to parse things, of course, but it looks like they're heavily optimizing this. "To simplify JIT implementation the environment (C code header needed to C code generated by MJIT) is just an vm.c file. A special Ruby script minimize the environment (Removing about 90% of the declarations). One worker prepares a precompiled code of the minimized header, which starts at the MRI execution start".
Their current results are that "No Ruby program real time execution slow down because of MJIT" and "The compilation of small ISEQ takes about 50-70 ms on modern x86-64 CPUs". You're of course using more CPU (to do the compilations in parallel), and you have to have a compilation suite available at runtime, but in many circumstances that is perfectly reasonable.
IIRC, the gcc C compiler doesn't generate machine code itself either; it generates assembly code, which is then farmed out to a separate assembly process (using using GNU assembler aka GAS). Farming out compilation work to other processes is not new.
It seems to me that this is a really plausible trade. This approach means that they can add a just-in-time compiler "relatively" quickly, and one that should produce pretty decent code once they add some actual optimizations (because it's building on very mature C compilers). The trade-off is that this approach requires more run-time CPU and time to create each compiled component (what you term as overhead). For many systems, this is probably an appropriate trade. As I posted earlier, I'm very interested in seeing how well this works - I think it's promising.
It's faster to hand-generate machine code straight from an interpreter than to invoke a C compiler. But that is not the only issue. As with everything else, this is a trade-off, and I'm eager to see how it works out. I can see some positive reasons to do this:
1. The Ruby developers get highly-optimized machine code, with relatively little effort on their part. Many, many man-years have been spent to make C compilers generate highly optimal code.
2. The C language, as an interface, is extremely stable, so once it works it should just keep working. Compare that to the constantly-changing interfaces of many alternatives.
3. Debugging is WAY easier. If there's a problem in generated code, it's way easier to read intermediate C code (especially after going through a pretty-printer) than many other kinds of intermediate formats, and millions of people already know it.
In short, this approach means that they can very rapidly produce a system that can run tight loops very quickly, one that resists interface instability (so the approach should keep working), and one that's easy to debug (so it should be reliable). For many applications, the fact that it takes a little more time to do the compilation may be unimportant, especially since that work is embarrassingly parallelizable.
I'm very interested in seeing how this plays out. If this works well for Ruby, I suspect some other language implementations will start considering using this approach. I'm sure it's not the best approach in all circumstances, but it might work very well for Ruby - and maybe for some other languages like it.
> The Ruby developers get highly-optimized machine code, with relatively little effort on their part. Many, many man-years have been spent to make C compilers generate highly optimal code.
Not for machine generated code. C compilers work well on human generated code, and not as well as Ruby -> C "translations".
> Not for machine generated code. C compilers work well on human generated code, and not as well as Ruby -> C "translations".
That depends on the machine generated code. C compilers are optimized for whatever the C compiler authors perceive as a common construct. If the generated C code uses constructs similar to what humans do, it's often quite good. If not, you can change the code that generates C, or in some cases you can convince the C compiler authors to optimize that situation as well.
In fact, one of the oldest JITted Ruby implementations is Topaz, which was built on top of PyPy (i.e., it's a Python program that uses the PyPy infrastructure to parse/JIT/run Ruby instead of Python).
I wonder if Matz may consider even just adopting Crystal in the future, considering it is almost 99% compatible with Ruby and has 10x-20x performance gains out of the box.
This. Some concepts and syntax are similar but they are quite different. I'm not saying that in negative sense - static typing with type inference rocks.
Everything in Ruby happens at runtime. Even the definition `class X` becomes a runtime `Class.new` invocation. It's imperative from the inside out. Even a JIT is compiler won't and can't solve the fundamental flaws permanently baked into the language. If you want performance, Ruby probably shouldn't be the first tool you reach for.
TruffleRuby + Graal brings full Java equivalent JVM performance to Ruby. It can even AOT compile a class definition like you described as impossible by using Partial Evaluation.
Oracle's plan for world domination via JVM is completely changing the performance landscape for dynamic languages.
While my main concerns about Ruby aren't directly related to type systems or static typing, at this point in my career, the speed at which I can hack out a script in a language in 24 hours isn't related to how well designed I think the language is.
Speaking of optimizing method calls: now that it's been a few years, I wonder what Ruby folks think about refinements. Are you using them? Are they helpful? Horrible?
I remember reading from the JRuby folks that refinements would make Ruby method calls slower---and not just refined calls, but all method calls [1], although it sounds like that changed some before they were released [2]. It seems like people stopped talking about this after they came out, so I'm wondering if refinements are still a challenge when optimizing Ruby? I guess MRI JIT will face the same challenges as the Java implementation?
[1] http://blog.headius.com/2012/11/refining-ruby.html
[2] https://github.com/jruby/jruby/issues/1062