Contrary to the naysayers, I like seeing stuff like this. Why? Because it's a simple, gentle introduction. It's easily digestible for the newcomer. And it might be easy enough to encourage a newcomer to start building their own VM that goes on to be something real.
For those that criticize it and find faults with it -- I'm sure the author would consider pull requests. Or you could provide your own fork with all the improvements that you believe are necessary.
Calling people who criticize that blog post "naysayers" is childish, at best. Once something is out there on the web it's going to get some criticism, that's normal.There is nothing wrong with that.
I'd like to see such an article on a register based VM. Pawn and Lua are nice examples. Most VMs are stack based but this is mainly because they are conceptually easier to understand. Register based machines have some real advantages, like requiring far fewer instructions inside tight loops.
This paper [1] (thou it is a tad old now) shows that the register machine approach does still outperform the stack machine by a fair margin. Largely from the reduction of needed instructions.
Wonder which is a better source represntation for a JIT thou.
JIT compilers aren't really an argument either way, as you can do tracing or transform to an SSA graph from either representation quite easily. All the real complexity is further down the compilation pipeline.
I (half) wrote a toy trace compiler based on a register bytecode a while back. Writing the front end was quite pleasant - the difficult parts were back end stuff like figuring out how to support trace exit information. (The difficulty there is that you might want to perform stores, materialize objects etc as you go off trace as part of store and allocation sinking.)
That would be very inefficient, because registers cannot be indexed. The dispatch you have to introduce to jump to code that references the right physical regs would murder you with mispredictions.
Also, register VMs typically have three operands. Specialising each instruction for each possible register for three operands would result in a ridiculous volume of code.
Special purpose instructions that access fixed registers would be fine, but general purpose operand references cannot be sanely implemented in this way. Existing register VMs use memory because it's faster.
> Also, register VMs typically have three operands.
Wrong. Only SSA VMs (with an optimizing compiler) need three operands.
Simple register VMs as well as hardware CPUs have two operands only,
and you can easily fit them into one word then.
You cannot do better SSA based optimizations then, but the VM speed is much faster than with two or multi word ops, which blow up your instruction cache.
lua has to use shorter ranges to get 3 operands into one word.
Writing a JIT with two-address ops is also trivial. It's a straightforward conversion using fixed registers.
> Specialising each instruction for each possible register for three operands would result in a ridiculous volume of code.
Nobody does that, it's not needed.
> Special purpose instructions that access fixed registers would be fine, but general purpose operand references cannot be sanely implemented in this way. Existing register VMs use memory because it's faster.
No. You need general purpose operand insns for only one or two fixed registers (eax, edx typically). Nobody implements all possible combinations.
And you use the stack to shuffle values around, not memory.
Multiple real implementations of register VMs have three operand instructions. Lua, Luajit and Guile are examples that I'm familiar with. For that matter, many ISAs feature three operand instructions. Even some of the recent extensions to x86-64 are three operand versions of (some of) the vector instructions.
If you think three-operand instructions aren't common in both hardware and interpreter design, you are simply misinformed.
None of this has anything to do with SSA. SSA IR can be reasonably constructed from any of stack, accumulator, two-operand register or three operand register bytecode. All of the usual optimizations follow from that.
> the VM speed is much faster than with two or multi word ops
While variable length encodings are best avoided, that doesn't rule out three operand instructions. For a good example of an extremely fast interpreter that uses such instructions, have a look at luajit.
> Nobody does that, it's not needed.
That's what the OP was suggesting, so that's what I was discussing.
Registers are stored in memory, but ideally "memory" means L1 cache, and it seems to me like register VMs would have better cache locality. This might be why they're getting more popular relative to stack VMs as the speed advantage of cache increases.
Stack machines are constantly using the same absolute offsets into the stack when evaluating in a loop, or evaluating different branches of a big expression. In fact, one of the simplest register allocators you can use in an expression code generator is a stack; pop to allocate, push to deallocate, and spill if you run out. On a CPU, this will have benefits, but in a VM, it's just a relabeling scheme for the same memory slots a stack VM would use.
What you potentially gain with a register VM is reduced accounting overhead; a stack machine will be constantly adjusting the stack pointer on every operation. It's a tradeoff for less complexity at codegen time. It's swings and roundabouts in the lowlands of performance, though; no loop and switch VM will be super-fast.
Register based VMs are not more complex than stack based VMs. You can even have a stack and the only difference is that your instructions operate on registers and not directly on the stack. I only consider generating code more complex with a register based VM.
I created a register-based VM named RCPU, for educational purposes. You can find it at https://github.com/ddfreyne/rcpu. It has most of what you’d expect from a register-based VM, and even has video output (rcpu-assemble samples/video.rcs and then rcpu-emulate the resulting file with --video).
Not an article, but this is a register-based VM that should be simple and straight-forward to understand. Also comes with an assembler, disassembler and debugger.
I'm a bit disappointed - this VM doesn't have instructions for looping or branching, nor does it really use the registers in any way. I was hoping to read a writeup that introduced some concepts that were used in real (non-toy) systems.
Hi. I updated the article to include a bit more information on registers/branches, and an example on the github with a loop. I wanted to leave the registers (or well the instructions for moving/setting registers) as an exercise for the reader to see if they could implement it themselves :)
You could implement looping by implementing an instruction, say 'JMP', that takes an 'address' as a parameter that modifies the ip. Branching can be implemented similarly with an instruction called 'BNZ', which takes two parameters, a register and an address, which changes the ip to the given address if the register is non-zero. SUB would be useful so you can subtract a register from itself to branch if equal. Other routes include using the top of the stack with your operations and branching, so you can stick to one parameter in the above instructions.
Agreed. I wrote a simple virtual machine, and the most interesting thing was writing the compiler that would read my "assembly", and spit out the bytecode. Including handling labels and jumping instructions.
My VM was very simple, and I've not touched it for a while, but the whole thought of how to handle branching was what made it more interesting than other similar toy systems.
I'm pretty sure everyone has wrote their own toy VMs, but I'll go ahead and throw mine out there. (well, 1 of the 3 I've wrote that I like best). It's called LightVM and is intended to be capable of running on tiny microcontrollers.
The most cool thing I like about it is the opcodes and registers are extremely general purpose. So, to do a branch, you do `mov IP, label`, or even a "push.mv" instruction which when used against IP is basically the same as the usual "call" instruction, but can also be used with data registers to save a register to the stack and then set it to a value.
I've found the hardest thing about making a VM isn't making a VM, but rather making the infrastructure around it (assembler, debugger, compilers, etc)
So, to do a branch, you do `mov IP, label`, or even a "push.mv" instruction which when used against IP is basically the same as the usual "call" instruction
I wrote something a little like this once too - there was a register stack and call, jump, branch were all implemented by pushing or popping the register stack.
For those who want to implement a VM as an exercise, I recommend to implement a simple JIT-compiler after that. You'll probably be impressed at performance improvements and it's funny exercise to do. I used GNU lightning to generate machine code.
I am starting to sound like a broken record, but here it goes. If you want a more complete tutorial on writing stack based virtual machines, check "The Elements of Computing Systems" and its accompanying course, http://www.nand2tetris.org/.
The book teaches you to build:
1) A CPU from basic electronics elements
2) An assembler to generate machine code
3) A bytecode VM that can be simulated and an assembler generator from the bytecode
4) A basic programming language that generates bytecode
5) An operating system using that language.
I'm midway through building the Assembler and VM myself :-).
This project is nice for educational purposes, but I wouldn't call it a VM, but instead a "bytecode interpreter".
I think nowadays it is kind of a minimum requirement to have the intermediate code JIT-compiled (or at least compiled).
I'm also missing a garbage collector, although that is not necessarily part of a VM (but often is). See NaCl for a counterexample. By the way, a project that I'd like to see is an efficient garbage collector implemented inside the VM, instead of as being part of the VM.
It's interesting that VM for some doesn't mean the same as virtual machine. A bytecode interpreter _is_ a vm with the bytecodes representing opcodes of the machine. What this isn't is a 'modern VM' complete with JIT and GC.
While seemingly simple, the simple non-turing example is not too far off from the (simple) Forth-like stack-based programming language found and executed in bitcoin transactions.
For a project that goes a bit deeper (branching, i/o, etc) consider writing a Chip8 simulator. There's lots of games written in chip8 bytecode to test with!
I find these kinds of very basic intro articles frustrating. They till the same ground over and over: a tiny instruction set implemented with a switch statement. None of the more difficult issues are addressed: exception handling, linking to libraries or other programs written for the same VM, portability of programs across architectures, accessing the OS for services like file I/O, time, etc.-- All the things that make a toy not a toy.
Every CS student in the world has written a toy VM just like this one.
That's a little ungenerous of you. The toy VM has presented an aha! moment to all of us at one time - or more than once, when you learn about Turing equivalence and the rabbit hole that leads from that.
Skip the articles beneath your skill level and move along.
Agreed, it was a bit ungenerous, esp. b/c I didn't realize til someone mentioned it below that the author was a teenager.
My objection, though, is not what's above/below me, it's that given the title, I was expecting a deeper article only to find one that has appeared numerous times. It was frustration I was expressing more than condescension. But you're quite right in using the term "ungenerous." I appreciate the gentle reproof.
The details you're talking about are pretty much just details; for instance, a foreign function interface for C functions (and thus system calls) is a pretty straightforward exercise that involves just a bit about the C ABI for your platform.
It's much easier to teach someone who knows how to build a basic register machine how to set up the stack for a call to a C function than it is to teach someone who doesn't know how a register machine works how to build one.
Also: there are reasons you might want to build a simple VM that has no access to system services and no notion of an exception. BPF is a good example of such a virtual machine.
If there's something frustrating about this article to me, it's that it lacks a motivating example. Why would you want to build this VM? What are you compiling down to it from?
Since I also wrote a toy VM where the toolchain supports linking of different code files (albeit the runtime doesn't implement CALL / RET) I decided to make a blog post about this specific topic.
I feel the same way. I went to read the post expecting a lot more than what I found and came away feeling both more knowledgeable than I thought I was and more ignorant for not knowing that I could get away with calling what a saw a "Virtual Machine"
It's an instruction set, with an instruction dispatcher, a stack, and a register file. Why would it surprise you that someone would call it a VM?
Are you maybe getting your signals crossed between the kind of VM this article is talking about (in the p-code sense of a VM) and virtualization systems?
I think the title is very misleading. This is not a virtual machine but an interpreter for a made up assembly language. There is nothing wrong with that and I am sure a beginner would find it very useful. But reading the title I was expecting something quite different.
> A process VM, sometimes called an application virtual machine, or Managed Runtime Environment (MRE), runs as a normal application inside a host OS and supports a single process. ... Process VMs are implemented using an interpreter; performance comparable to compiled programming languages is achieved by the use of just-in-time compilation.
> Parrot is a register-based process virtual machine designed to run dynamic languages efficiently. It is possible to compile Parrot assembly language and PIR (an intermediate language) to Parrot bytecode and execute it.
(I quoted that one over Java and Python virtual machines because it uses the phase "assembly language" in the context of the VM.)
For those that criticize it and find faults with it -- I'm sure the author would consider pull requests. Or you could provide your own fork with all the improvements that you believe are necessary.