Simple Language Implementation Techniques for the 21st Century

andrewchambers · on Sept 30, 2014

Sometimes I wonder if using rpython is a good meta language for this tracing JIT technique. I don't know if there would have been any benefit if the meta langage was specifically designed for the purpose of implementing dynamic interpreters with the meta jit magic.

Adding explicit static typing to the meta language (i.e. replace rpython) could at the very least remove the long type analysis, and translation times, while providing better error messages.

smarr · on Oct 1, 2014

RPython is certainly not perfect, and the PyPy/RPython community is also very much aware of RPython being something that developed over time. But, RPython was from the start 'designed' as a language for implementing dynamic languages. The problem is that it took multiple iterations to identify meta-tracing as a foundation that works. And those iterations left some cruft here and there.

Compared to the Java-based experiments I did, sometimes I miss the ability to be explicit about types, and have to use RPython's assertions instead, which is a clumsy way of telling RPython about types. But overall, RPython isn't the worst toolchain I have ever worked with. And, typically, if the error messages are really bad, the community is very helpful and eager to improve them.

acqq · on Oct 1, 2014

My take from the data presented in the article is fully different: both of the "acceptably fast" techniques use JIT to make the executed code fast. And the author avoids to consider what the JIT does. JIT means that effectively, at the end, the JIT engine produced the native code. The same code your real compiler would produce. All the processing before that phase is just to allow the clumsy "too dynamic" languages to be able to balance between the "do at least something" and "do something fast enough" (the second happens after the "tracing" code detects something is repated often enough. And the code that is actually important has to be run as often as it can).

But if you write the code that is from the start less "dynamic," you can get the quality code without all the engines in between and all the overheads that are incurred.

The "asmjs" achieves the speed because it is a protocol which allows the JIT to simply compile the whole big chunk of program to the native code at once, without all the tracing etc. And to have such a code at the first place, you need some less dynamic language, like C, as the source.

On another side, the programmers don't like to have to be "too specific" but recently more languages that gain popularity demonstrate that the types can often be inferred in the compilation phase and that such a code can be convenient enough for a lot of developers but fast too.

So, no, I don't think that the future is in PyPy. It exists only because the Python language "specification" (and the reference implementation) is too clumsy to allow clearer type inference. In my opinion, the future is in the type inference and the real compilation, not the clumsy-big-overhead virtual machines. Obviously, the developments that actually move in the direction I like are obviously: asmjs, Swift and Rust. Disclaimer: I know I'm biased in preferring, whenever possible, the compiled code to the interpreted or the JIT-ed. YMMW.

vbit · on Oct 1, 2014

Apart from the time and effort spent in writing the AOT compiler as the other poster mentioned, there is also the question of compiler complexity.

You seem to be implying that JITs are somehow more complex and only help with languages having no static typing. Both assumptions aren't quite true. Also, consider the cases where it's much easier for a JIT to produce code, as it has more information than a static compiler.

For e.g. which functions should be inlined? Which mechanism should be used for dynamic dispatch? If you have 10s of types possible at one dispatch but only two seen in practice, the tracing JIT will only compile the fast paths with an exit for the other paths. The AOT compiler will have to produce a vtable or big switch statement to handle all the types as it lacks the runtime information. Which would you consider 'quality code'?

acqq · on Oct 1, 2014

Yes, I already wrote that I'm biased. For me the essential progress is only when it results in: less writing than in C, less memory used, less CPU cycles used, more predictability in the execution, easy mixing with C or assembly code. Basically, allowing me to have everything I can get in C, but to make "easy cases easier" like that I can have type inference etc. That's why I like what Rust is trying to do, but even more what Swift is doing, as in the later easy things look more elegant.

As soon as you don't care for any of these, I can see that you can like the languages with VMs, GCs and JITs.

I use dynamic langauges too. But it's hard to have me impressed by some, because when I compare it to all which already exist, I must really see some major benefit, not only in the "sugarcoating" but in the work it can do. E.g. Python is worthy because of SciPy, Lua for the increase in the executable measured in kilobytes etc.

smarr · on Oct 1, 2014

The one issue you seem to consider as not relevant at all is the effort it takes to implement a 'fast enough' language. With RPython as well as with Truffle+Graal, you can implement a language in less than 10k lines of code and get performance within reach of state of the art VMs.

Sure, if you prefer more static languages, you can probably move the point of optimization from runtime to compile time. However, I would like to see that you can achieve the same degree of performance by using LLVM for a small language also in the range of 5k-10k LOC. I am not aware of any similar experiments in that field. Rust, Go, or Julia seem to be all larger and more complex languages, so a direct comparison isn't really fair. Would be interested if there is something out there I have missed so far.

acqq · on Oct 1, 2014

You're right, I'm looking at the presented topic from the point of view of the language user, not the point of view of a developer of the experimental language. And as the language user I have really big expectations to even consider using it.

I also expect that as soon as somebody starts to design the language based on the big infrastructure behind it, the result will be some not too important variation of the existing stuff: if he depends on the PyPy he's probably just making some sugared version of already existing dynamic languages (Python, Ruby etc).

It's not that I don't use dynamic langauges, it's just that I look at them too as "what they can do for me." If it's the shortest thing to do, I'll still write a few lines of awk. Then, if it's text processing, I'll probably still use Perl. I used Python for some very small simple GUI apps in it, or for having a SciPy and matplotlib and stuff. I use Lua from time to time in SciTe.

The real progress is hard. But don't let that prevent you from experimenting. Good luck!

ihnorton · on Oct 1, 2014

This paper was an interesting read! And thank you for the exceptionally well-selected references: I think I will enjoy reading several of those.

I'm not sure about the others, but as far as Julia goes, the main parts of the language implementation are: parser and lowering in about 5000 lines of Scheme; type inference is 3k lines of Julia; and codegen to LLVM is about 6000 SLOC of C++. (and then there are a few tens of thousands of LOC of runtime and library code in C and Julia). I suspect it would be possible to implement a nice, smallish, LLVM-backed DSL using the Ocaml bindings in well under 10k lines, but I am likewise unaware of such an experiment. On the other hand, implementing Julia in Truffle or RPython would be a neat project.

acqq · on Oct 1, 2014

As far as I understand, Julia is the language that will certainly be a worthy replacement of Python once it gets enough library functionality (I admit actually didn't follow how much it gets, I'd appreciate if somebody writes the current state). It's really nice that it was designed from the start to be fast.

ihnorton · on Oct 1, 2014

It really depends what libraries you need. You might be interested in: http://pkg.julialang.org/pulse.html (also a searchable package list).

vbit · on Oct 1, 2014

Have you looked at http://terralang.org/ ?

It may be a good toolchain if you're targetting static compilation using the LLVM.

jiyinyiyong · on Oct 1, 2014

Inspiring even for novice like me.