A Virtual Machine

morazow · on April 6, 2018

If anyone is interested on making compilers / programming languages, there is another recent initiative from Per Vognsen, Bitwise: https://github.com/pervognsen/bitwise.

I occasionally watch streams so far it was very helpful to observe how he makes design decisions, thought process and implementations.

munificent · on April 6, 2018

Author here, and uncomfortably excited to discuss this with any and all who are interested.

tdba · on April 7, 2018

This is one of the prettiest online books about computing that I've ever seen. I really appreciate how much effort you have put in to making the layout clean and the styling enjoyable to look at. Great visual presentation can really take a load off the reader's brain, leaving more cycles to think about the actual content. Bravo!

munificent · on April 7, 2018

Thank you! I used to be a graphic designer and web designer before I went more left brain. I put a lot of love and time into the design. It's fully responsive, which is nice, but the part I really like is that the whole page fits into an invisible grid system. All of the columns, line heights, spacing between paragraphs, everything (well, except the illustrations...). It took forever, but I think it helps give the page a harmonious visual cadence.

> Great visual presentation can really take a load off the reader's brain, leaving more cycles to think about the actual content. Bravo!

Yes, I totally agree. Design is how we get information through our physical senses into our brain. There's no shortcut to skip past the eyes and ears, so you have to put care into the transmission medium if you want the signal to get through effectively.

swaggyBoatswain · on April 7, 2018

What are you using to do put the code snippets up? Some parts are greyed out but the most important points stand out

Ive been looking for something like hightlightjs that does this, otherwise i intended forking my own version of it and using | | piping to highlight important snippets

acqq · on April 7, 2018

I was especially glad to see the carefully hand-drawn diagrams. Thanks for that.

yoodenvranx · on April 6, 2018

I have nothing to discuss but I wanna say that I loved Game Programming Patterns and I am really looking forward to read Crafting Interpreters!

munificent · on April 6, 2018

Thank you!

joe_the_user · on April 6, 2018

One quick question.

You talk about a VM allocating memory. In the process of constructing my own simple-as-possible VM, I figured allocation would be done by the routines involved, with the VM just providing CPU-like facilities for accessing raw memory. What are the advantages of a VM allocating memory?

munificent · on April 6, 2018

I think this depends on what you mean by "VM" and how high-level you want yours to be. If you're writing a programming language VM (as opposed to a container-like VM thing), and your programming language doesn't explicitly manage memory, then it basically falls to the VM to do it.

The language in the book is a little scripting language, so the VM implements both allocating memory for new objects and reclaiming memory for them when no longer needed.

If you were writing something like a bytecode Pascal VM, then the VM itself wouldn't need to allocate memory outside of, say, the built in "New()" function.

chubot · on April 6, 2018

We just discussed one aspect of this here: https://www.reddit.com/r/programming/comments/8a9zhz/impleme...

There is no real way for the VM to avoid dynamically allocating memory. At the very least, you have to allocate call frames (even if you know their size statically).

Maybe the only way around this is if you restrict arbitrary recursion in the language.

EDIT: Hm from the reply to that I was somewhat wrong...

joe_the_user · on April 6, 2018

Thanks for the answer.

"No real way" is relative, of course. I happen to have constraints pushing for a VM as small as conceivably possible without code-blow-up, like single-instruction VM but implementing most logic/arithmetic primitives.

But which parts are dicier than others is worth considering.

chubot · on April 6, 2018

Yeah I was a little wrong, and I edited my comment. But I am still confused by your question. There are lots of places where you might need to allocate memory -- it depends on the semantics of your source language, and it depends what you consider part of the VM proper, etc.

It sounds like you are describing something with more primitive types, like a CPU emulator or WebAssembly, rather than a dynamic language VM, which is the subject of this book.

e12e · on April 7, 2018

Say you want to model a Turing machine. You can allocate a fixed length tape (making a linear bounded automaton), or a tape that can grow (closer to a "real" Turing machine).

Now, you could argue that writing to a part of the finite tape that's so far clear is "allocating memory" - but at the same time that could be a single section of statically allocated memory from the perspective of the Os that runs your vm...

For some related fun, I still think the nim "howistart" is great (but I haven't tried to follow along in a while, not sure if the code works straight away with up to date nim (but I see there's a notice mentioning a 2018 update, so I hope it's still /again working!)):

http://howistart.org/posts/nim/1/

e12e · on April 8, 2018

Although, looking at the github repo; I'm confused. Can't find the change related to 2018 update, but there's this:

https://github.com/howistart/howistart/pull/50/files

joe_the_user · on April 6, 2018

Sure, that explains it. The VM has to allocate memory if it's interpreting a dynamic language. Sorry if the question was elementary.

chubot · on April 6, 2018

OK no problem, it would be interesting to see a comparison between the two styles of VM! I remember that there is at least one talk here which goes into the difference:

http://soft-dev.org/events/vmss16/

I forget what the different names he used. He distinguished between runtimes like the JVM and runtimes like Xen, which emulate an operating system too. A CPU emulator seems closer to Xen than the JVM.

-----

EDIT: The names were "system" and "process" VMs, mentioned here:

https://news.ycombinator.com/item?id=16777564

So I think you are talking about a "system VM", where this book is more about "process" VMs. They are related, but distinct -- it even says so in that book's title!

It's definitely true that in something like Xen you would want to avoid allocating memory more than in something like a JavaScript VM. (And it would be easier to avoid allocating, too.)

I don't like that terminology, but it's a useful distinction.

35bge57dtjku · on April 7, 2018

I don't think it's because it's a dynamic language like you seem to imply. How else would you implement both stack and heap in non-'dynamic' languages without constraining them to a fixed size?

metaobject · on April 6, 2018

Very clear presentation. Helpful diagrams. Clean code. This has a very “Linkers and Loaders” (John Levine) feel to it. Very well done.

carussell · on April 6, 2018

You should drop a note on your blog about the new chapter(s). I'm subscribed to your blog's RSS, but not the one for craftinginterpreters.com—because a feed for the latter doesn't actually exist.

munificent · on April 6, 2018

I think I used to do that for my first book, ages ago, but these days it seems like not many people are using RSS. Interestingly enough, I did just start using RSS again myself recently.

Posting on my blog feels kind of heavyweight for this, though. Do you do email? If so, subscribing to the mailing list is a good way to get notified. I only use the list for new chapters and other "major announcement" stuff like that, so it's not spammy.

carussell · on April 7, 2018

I didn't mean to suggest you do this for every chapter. Just something in the vein of, "In case you haven't been following along already, since I last blogged, I've completed several chapters of Crafting Interpreters."

coldtea · on April 6, 2018

This is probably the clearest programming tutorial I've ever read...

dman · on April 6, 2018

Who draws the diagrams for the book? They made the book a fun read for me. Are they hand drawn?

munificent · on April 6, 2018

You can watch me draw one of them here: https://www.youtube.com/watch?v=iN1MsCXkPSA

:)

jwdunne · on April 7, 2018

Thats an awesome attention to detail. It looks like a lot of work on top of writing a book. It feels like a great debt is owed for the amount of work that's gone into something I can essentially enjoy for free.

When it's finished, I'll buy a copy.

resource0x · on April 7, 2018

Possible optimization for step 1 :) http://tommaitland.net/graphpaper/

munificent · on April 7, 2018

I have thought about doing the initial sketch digitally, but I think it's actually a little faster to do it on paper. I could be wrong.

dman · on April 6, 2018

Spotted Lisp in Small Pieces in the background!!

munificent · on April 6, 2018

Yup! That book was dense. :)

bogomipz · on April 6, 2018

I just wanted to say how much I've enjoyed following this.

I get excited when there's an update. Looking forward to reading this update over the weekend.

PS have you ever considered doing some accompanying video presentation when you the book is completed?

Cheers.

munificent · on April 6, 2018

I did make a little video showing how the illustration is done, but in general, no. I barely have enough time to write the book, and video production (at least for my perfectionist self) is very time-consuming.

chubot · on April 6, 2018

One thing that popped into my mind recently, as I've been going back and forth between implementing my own stack machine interpreter and forking the Python VM (for the Oil shell).

What other books cover bytecode interpreters? (In other words, the part of the book that you are 2 chapters into.)

I can't think of any! Many people learn about the front end in a "compilers" class. The Dragon Book and "Modern Compiler Implementation in X" are two of the most popular books, but they don't cover interpreters at all.

The other branch people learn about is Lisp interpreters, which are often tree intepreters (as in the first section of your book). And when they are bytecode interpreters, I think they have a compilation process that's fairly different from say Python/Lua/JavaScript. For example the "nanopass" style is somewhat popular for teaching, and pretty different from what I understand. The source language is further from the machine model.

It seems like the best sources are the Lua papers you mentioned, but I would call those primary sources.

I can't think of any "secondary sources". Terrence Parr's Language Implementation Pattern does part of a bytecode interpreter, but it's in Java, and it leaves out the bytecode compiler as far as I remember.

So somewhat surprisingly I think everybody who implements bytecode VMs these days just looks at Lua and Python?

I guess Lua was influenced heavily by Scheme, and I don't know where Python got its implementation style, maybe from the ABC precursor.

Anyway I'd be interested in hearing about other sources (primary and secondary), but it dawned on me that this is the ONLY full bytecode interpreter I've seen in a textbook. And I have a whole bunch of books and have been doing this for a few years.

So if that's the case, then kudos for filling this gap! :)

EDIT: A primary source that has been useful for me is The ZINC experiment: an economical implementation of the ML language, which is a description of the predecessor to OCaml (both written by Xavier Leroy). Although OCaml is statically typed and functional, the bytecode interpreter itself has some similarities to Python/Lua. They mention keeping your top-of-stack register in a CPU register, the relationship between runtime values and garbage collection, etc.

The book "Real World OCaml" reviews a little of this material.

https://scholar.google.com/scholar?cluster=11229975720599781...

CodeArtisan · on April 6, 2018

Virtual Machines: Versatile Platforms for Systems and Processes covers both virtual machines for implementing programming languages and virtual machines for emulating cpu architectures. One of my favorite optimization techniques explained in this book is pre-decoding:

After loading the program's instructions, every opcode is replaced by the address of the corresponding routine implementing the instruction. For example,

      [OP_ADD][REG_A][2]
      [OP_MUL][REG_A][REG_B]
      [OP_RET]

become

      [0x4bcf00][REG_A][2]
      [0x4bcf10][REG_A][REG_B]
      [0x4bca05]

Then, using goto from GNU C[1], decoding is now just

      goto *ip;

I think this is still the state-of-art technique for interpreting instructions.

[1] https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html

chubot · on April 6, 2018

Ah thanks, I think this is the terminology I mentioned in this comment!

https://news.ycombinator.com/item?id=16777516

that is, "system" VMs vs. "process" VMs. I don't really like those terms, but the distinction is a good one. I think the author must have been at the VM summer school I linked there.

-----

BTW this 2015 paper is saying that the labeled gotos technique isn't a big deal on Haswell:

Branch prediction and the performance of interpreters -- Don't trust folklore

https://scholar.google.com/scholar?cluster=20257610669850946...

In other words the branch predictors improved enough on Intel such that it's not a big optimization. But I don't like how they didn't mention other CPU architectures. Intel is not all we care about!

Also, it would be nice to have an update on this 2015 paper to account for 2017-2018 Spectre/Meltdown mitigations...

CodeArtisan · on April 7, 2018

The first part of the paper compares token threading vs naive switch in cpython 3.2 which has a compile flag to disable token threading[1]. I were referring to direct threading but the difference is probably minor or irrelevant.

on x64, token threading is

    ; goto tokens[program[ip]]
    mov r0, [program_instructions + ip]
    jmp [tokens + r0]

while direct threading is

    ; goto program[ip]
    jmp [program_instructions + ip]

the paper says

Nehalem shows a few outstanding speedups (in the 30 %– 40 % range), as well as Sandy Bridge to a lesser extent, but the average speedups (geomean of individual speedups) for Nehalem, Sandy Bridge, and Haswell are respectively 10.1 %, 4.2 %, and 2.8 % with a few outstanding values for each microarchitecture. The benefits of threaded code decreases with each new generation of microarchitecture.

but looking at the graphic, it seems a few benchmarks still had a +5% speed boost on Haswell.

It would also have been interesting if the paper provided the generated code by gcc for the switch statement. If the switch statement's cases are linear (0, 1, 2, 3, ...) and if there are more than 4 cases, then gcc token threads the code.[2]

[1] https://hg.python.org/cpython/file/v3.3.2/Python/ceval.c#l82... [2] https://godbolt.org/g/yRjFto

runevault · on April 6, 2018

Thanks for mentioning this book, it sounds amazing. If I hadn't just bought Lisp in Small Pieces I'd probably be looking at picking this up, but I need to work through that book first. However this is going on my future buy list.

tom_mellior · on April 7, 2018

So this is a kind of decoding of token threaded code to subroutine threaded code? How does the actual argument passing work, i.e., how does the routine starting at 0x4bcf00 know where to look for its operands?

freecodyx · on April 7, 2018

I follow you on twitter and get excited whenever there is an update. but man you are very perfectionist

Thank you for the great work and can't wait for the next

munificent · on April 7, 2018

> but man you are very perfectionist

Yeah. It's a real "blessing and a curse" kind of thing. :-/

elderK · on April 10, 2018

Hey Munificient, I just wanted to say that I'm a major fan of your book.

I find your writing extremely clear :)

imdsm · on April 7, 2018

Looking forward to reading through this. Thoroughly enjoyed Game Programming Patterns, in particular your style of adding comments to the side.

UncleEntity · on April 6, 2018

> Alas, the fastest [dispatching] solutions require either non-standard extensions to C, or hand-written assembly code.

I assume gcc's label addresses[0].

I've been planning to try those out the next time I get around to playing with minischeme to see just how much of a speed difference they make -- I'd imagine quite a bit since it could just hop around from opcode to opcode instead of having to go through the loop (which I believe is outside the dispatching function).

It's funny, searched the google to see if clang supported this and the top result (from stackoverflow) has the comment "This is terrible coding style. Why don't you use a switch or a function table. If you really want such code, you should goto 70ies/80ies BASIC."

[0]https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html

munificent · on April 6, 2018

> I assume gcc's label addresses[0].

Yes, that's definitely the main one I had in mind.

> I've been planning to try those out the next time I get around to playing with minischeme to see just how much of a speed difference they make

I added support for computed gotos to my bytecode interpreter for Wren and, if I recall, it gave about a 10% speed boost on my (mostly micro-) benchmarks. YMMV.

https://github.com/munificent/wren/blob/master/src/vm/wren_v...

chubot · on April 6, 2018

I cited a paper here that says this is of limited value on modern Intel CPUs: https://news.ycombinator.com/item?id=16777564

Although I think they are overgeneralizing the results... you probably still want it for other architectures.

the_arun · on April 6, 2018

What is the software this site is made of? Wordpress? GitHub pages? or Custom? It looks so cool. I am interested to know the details

munificent · on April 6, 2018

Just a handful of Python scripts:

https://github.com/munificent/craftinginterpreters/tree/mast...

The book is written in Markdown. The code samples are stored in separate files so that I can compile and test them. The code has little comment markers identifying where each chunk of code goes in the book, and the build script weaves it all together.

colonelxc · on April 6, 2018

All source is here. https://github.com/munificent/craftinginterpreters

the_arun · on April 6, 2018

Thanks!

ythn · on April 6, 2018

https://builtwith.com/

kiddico · on April 6, 2018

This whole book looks like a good read! I'll have to check it out later when I have some time.

runevault · on April 6, 2018

I'm up to the chapter in part 1 on Classes and it has been a great experience for reminding me how compilers work. Personally I recommend intentionally doing it in a language other than Java just to force the extra step of transliterating in your head, at least for me it has made the learning process much stronger.

munificent · on April 6, 2018

Yeah, I totally agree with that suggestion. Even handing typing in the code samples yourself instead of just reading them can help you get out of the "skimming" mindset and into really internalizing each step.

If you're curious people have ported the code in the book to a lot of languages. Some of them are here:

https://github.com/munificent/craftinginterpreters/wiki/Lox-...

Though, if you want to do this yourself, consulting those might be considered cheating. :)

GeertVL · on April 10, 2018

I am doing the same in Rust (a new language for me) and OCaml (known a little bit). It forces you to find different solutions and you learn new languages at the same time.

Love it. Thanks.

52-6F-62 · on April 6, 2018

Count me in, too! This goes right along with a lot of stuff I'm working on teaching myself right now.

ryanmccullagh · on April 7, 2018

Having been researching process VMs for a few years now, most tutorial miss a crucial piece: the garbage collector.