> Oh dear god. Care to elaborate?

ychen306 · on June 4, 2018

Overhead. It converts its IR to C; dumps that to disk; the C compiler loads the code back from the disk; the frontend parses the code (if not done carefully maybe CPP is also invoked); the compiler dumps the generated code to disk again; and then presumably dlopen loads the code back from disk again. There's also the overhead of spawning a separate compiler process. A better way would be to directly generating code to memory and link it. This is of course trickier, but is also what libraries such as LLVM's JIT infrastructure and libjit are built for. If you need more performance (i.e. LLVM's JIT is too slow for you), you roll your own infrastructure to do this -- which is what JVM and V8 do.

dwheeler · on June 4, 2018

There's definitely a run-time overhead, but it may not be that bad in practice. Details on the Ruby JIT implementation are here: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch#mjit-...

They don't "dump to disk", if you mean an actual storage device. By default they store data to a "file system in memory" (a tmpfs), so it never gets written to a long-term storage device (not even an SSD). Even if you do "dump to disk", on a modern OS storing things in a file just puts it in memory and schedules it for eventual long-term storage. Of course, doing things this way has overheads, but it may not be so bad.

The C frontend has to parse things, of course, but it looks like they're heavily optimizing this. "To simplify JIT implementation the environment (C code header needed to C code generated by MJIT) is just an vm.c file. A special Ruby script minimize the environment (Removing about 90% of the declarations). One worker prepares a precompiled code of the minimized header, which starts at the MRI execution start".

Their current results are that "No Ruby program real time execution slow down because of MJIT" and "The compilation of small ISEQ takes about 50-70 ms on modern x86-64 CPUs". You're of course using more CPU (to do the compilations in parallel), and you have to have a compilation suite available at runtime, but in many circumstances that is perfectly reasonable.

IIRC, the gcc C compiler doesn't generate machine code itself either; it generates assembly code, which is then farmed out to a separate assembly process (using using GNU assembler aka GAS). Farming out compilation work to other processes is not new.

It seems to me that this is a really plausible trade. This approach means that they can add a just-in-time compiler "relatively" quickly, and one that should produce pretty decent code once they add some actual optimizations (because it's building on very mature C compilers). The trade-off is that this approach requires more run-time CPU and time to create each compiled component (what you term as overhead). For many systems, this is probably an appropriate trade. As I posted earlier, I'm very interested in seeing how well this works - I think it's promising.

saagarjha · on June 3, 2018

There are presumably better ways to get assembly out of code than generating C and passing it through a compiler frontend.

dwheeler · on June 3, 2018

That is an unproven assumption.

It's faster to hand-generate machine code straight from an interpreter than to invoke a C compiler. But that is not the only issue. As with everything else, this is a trade-off, and I'm eager to see how it works out. I can see some positive reasons to do this:

1. The Ruby developers get highly-optimized machine code, with relatively little effort on their part. Many, many man-years have been spent to make C compilers generate highly optimal code.

2. The C language, as an interface, is extremely stable, so once it works it should just keep working. Compare that to the constantly-changing interfaces of many alternatives.

3. Debugging is WAY easier. If there's a problem in generated code, it's way easier to read intermediate C code (especially after going through a pretty-printer) than many other kinds of intermediate formats, and millions of people already know it.

In short, this approach means that they can very rapidly produce a system that can run tight loops very quickly, one that resists interface instability (so the approach should keep working), and one that's easy to debug (so it should be reliable). For many applications, the fact that it takes a little more time to do the compilation may be unimportant, especially since that work is embarrassingly parallelizable.

I'm very interested in seeing how this plays out. If this works well for Ruby, I suspect some other language implementations will start considering using this approach. I'm sure it's not the best approach in all circumstances, but it might work very well for Ruby - and maybe for some other languages like it.

"If it works, it isn't stupid".

saagarjha · on June 4, 2018

> The Ruby developers get highly-optimized machine code, with relatively little effort on their part. Many, many man-years have been spent to make C compilers generate highly optimal code.

Not for machine generated code. C compilers work well on human generated code, and not as well as Ruby -> C "translations".

dwheeler · on June 4, 2018

> Not for machine generated code. C compilers work well on human generated code, and not as well as Ruby -> C "translations".

That depends on the machine generated code. C compilers are optimized for whatever the C compiler authors perceive as a common construct. If the generated C code uses constructs similar to what humans do, it's often quite good. If not, you can change the code that generates C, or in some cases you can convince the C compiler authors to optimize that situation as well.