It's crazy that this post seems to have stumbled across an equivalent to the Copy-and-Patch technique[0] used to create a Lua interpreter faster than LuaJit[1]
The major difference is that LuaJIT Remake's Copy-and-Patch requires "manually" copying blocks of assembly code and patching values, while this post relies on the Go compiler's closures to create copies of the functions with runtime-known values.
I think there's fascinating processing being made in this area—I think in the future this technique (in some form) will be the go-to way to create new interpreted languages, and AST interpreters, switch-based bytecode VMs, and JIT compilation will be things of the past.
It’s not really copy and patch, the whole point of the copy patch is so you can inline that in your compilation output and it’s a fast baseline interpreter because individual builtin functions are optimized (via the compiler output you’re copying from) and inlined (which is why you need to patch to update what registers are being used. In this model you jit only control flow really, then inline the implementation of each bytecode operation (in contrast to sparkplug [https://v8.dev/blog/sparkplug] which just calls a builtin instead of copy/patch). It’s still JIT which is vastly different than an interpreter.
> JIT will be things of the past
Sorry no. JIT is not going anywhere. They mentioned in the article JIT would be better performance just more effort than they wanted to put in (a good tradeoff!) JIT powers Java, Wasm and Javascript VMs and are certainly the way to get the fastest code because you can give the CPU code that it can do a much better job predicting the next instruction. With interpreters you’re often limited by the indirection of loads when looking up the next function to call, and generating code for the control flow outside calling your “builtins” is precisely what Sparkplug is doing.
At the end of the day, like most of engineering, choose the right tool for the job, which in this case is simplicity (which is often the right choice!), but that doesn’t mean it’s always the right choice. For example if browsers did this then Javascript performance would tank compared to what we get today.
The JVM has had a template interpreter since the mid-90s, it’s not anything new, and template interpreters are only sufficiently performant as to provide acceptable execution speed until you JIT.
Template interpreters are not a substitute for real JIT — JIT compilation isn’t going anywhere.
My understanding of most optimizing compilers is that this is an extremely common "last step" sort of optimization. A lot of the optimizing work is beating the code into a canonical form where these sorts of templates can be readily applied.
It was also my understanding that that's also the point of "super optimizer"s [1] which look for these common patterns in something like LLVM IR to generate optimization targets for the mainline optimizer.
> It's crazy that this post seems to have stumbled across an equivalent to the Copy-and-Patch technique[0] used to create a Lua interpreter faster than LuaJit[1]
> this post relies on the Go compiler's closures to create copies of the functions with runtime-known values
To be clear the technique of using closures like this is ancient in the world of LISP. You can see in Paul Graham's books on LISP from the 90s, and in LiSP in Small Pieces, and many interpreters of 80s/90s vintage. I would say that it is quite standard.
I am finding a switched byte interpreter to be very expedient on my computer. It seems that if the # of cases is kept small enough, your chances of getting a good branch prediction can go up substantially. Something like a brainfuck interpreter runs extremely fast. In the worst case of randomly guessing, you are still going to time travel with a 12.5% success rate.
[0]: https://sillycross.github.io/2023/05/12/2023-05-12/ [1]: https://sillycross.github.io/2022/11/22/2022-11-22/
The major difference is that LuaJIT Remake's Copy-and-Patch requires "manually" copying blocks of assembly code and patching values, while this post relies on the Go compiler's closures to create copies of the functions with runtime-known values.
I think there's fascinating processing being made in this area—I think in the future this technique (in some form) will be the go-to way to create new interpreted languages, and AST interpreters, switch-based bytecode VMs, and JIT compilation will be things of the past.