If anyone is interested on making compilers / programming languages, there is another recent initiative from Per Vognsen, Bitwise: https://github.com/pervognsen/bitwise.
I occasionally watch streams so far it was very helpful to observe how he makes design decisions, thought process and implementations.
This is one of the prettiest online books about computing that I've ever seen. I really appreciate how much effort you have put in to making the layout clean and the styling enjoyable to look at. Great visual presentation can really take a load off the reader's brain, leaving more cycles to think about the actual content. Bravo!
Thank you! I used to be a graphic designer and web designer before I went more left brain. I put a lot of love and time into the design. It's fully responsive, which is nice, but the part I really like is that the whole page fits into an invisible grid system. All of the columns, line heights, spacing between paragraphs, everything (well, except the illustrations...). It took forever, but I think it helps give the page a harmonious visual cadence.
> Great visual presentation can really take a load off the reader's brain, leaving more cycles to think about the actual content. Bravo!
Yes, I totally agree. Design is how we get information through our physical senses into our brain. There's no shortcut to skip past the eyes and ears, so you have to put care into the transmission medium if you want the signal to get through effectively.
What are you using to do put the code snippets up? Some parts are greyed out but the most important points stand out
Ive been looking for something like hightlightjs that does this, otherwise i intended forking my own version of it and using | | piping to highlight important snippets
You talk about a VM allocating memory. In the process of constructing my own simple-as-possible VM, I figured allocation would be done by the routines involved, with the VM just providing CPU-like facilities for accessing raw memory. What are the advantages of a VM allocating memory?
I think this depends on what you mean by "VM" and how high-level you want yours to be. If you're writing a programming language VM (as opposed to a container-like VM thing), and your programming language doesn't explicitly manage memory, then it basically falls to the VM to do it.
The language in the book is a little scripting language, so the VM implements both allocating memory for new objects and reclaiming memory for them when no longer needed.
If you were writing something like a bytecode Pascal VM, then the VM itself wouldn't need to allocate memory outside of, say, the built in "New()" function.
There is no real way for the VM to avoid dynamically allocating memory. At the very least, you have to allocate call frames (even if you know their size statically).
Maybe the only way around this is if you restrict arbitrary recursion in the language.
EDIT: Hm from the reply to that I was somewhat wrong...
"No real way" is relative, of course. I happen to have constraints pushing for a VM as small as conceivably possible without code-blow-up, like single-instruction VM but implementing most logic/arithmetic primitives.
But which parts are dicier than others is worth considering.
Yeah I was a little wrong, and I edited my comment. But I am still confused by your question. There are lots of places where you might need to allocate memory -- it depends on the semantics of your source language, and it depends what you consider part of the VM proper, etc.
It sounds like you are describing something with more primitive types, like a CPU emulator or WebAssembly, rather than a dynamic language VM, which is the subject of this book.
Say you want to model a Turing machine. You can allocate a fixed length tape (making a linear bounded automaton), or a tape that can grow (closer to a "real" Turing machine).
Now, you could argue that writing to a part of the finite tape that's so far clear is "allocating memory" - but at the same time that could be a single section of statically allocated memory from the perspective of the Os that runs your vm...
For some related fun, I still think the nim "howistart" is great (but I haven't tried to follow along in a while, not sure if the code works straight away with up to date nim (but I see there's a notice mentioning a 2018 update, so I hope it's still /again working!)):
OK no problem, it would be interesting to see a comparison between the two styles of VM! I remember that there is at least one talk here which goes into the difference:
I forget what the different names he used. He distinguished between runtimes like the JVM and runtimes like Xen, which emulate an operating system too. A CPU emulator seems closer to Xen than the JVM.
-----
EDIT: The names were "system" and "process" VMs, mentioned here:
So I think you are talking about a "system VM", where this book is more about "process" VMs. They are related, but distinct -- it even says so in that book's title!
It's definitely true that in something like Xen you would want to avoid allocating memory more than in something like a JavaScript VM. (And it would be easier to avoid allocating, too.)
I don't like that terminology, but it's a useful distinction.
I don't think it's because it's a dynamic language like you seem to imply. How else would you implement both stack and heap in non-'dynamic' languages without constraining them to a fixed size?
You should drop a note on your blog about the new chapter(s). I'm subscribed to your blog's RSS, but not the one for craftinginterpreters.com—because a feed for the latter doesn't actually exist.
I think I used to do that for my first book, ages ago, but these days it seems like not many people are using RSS. Interestingly enough, I did just start using RSS again myself recently.
Posting on my blog feels kind of heavyweight for this, though. Do you do email? If so, subscribing to the mailing list is a good way to get notified. I only use the list for new chapters and other "major announcement" stuff like that, so it's not spammy.
I didn't mean to suggest you do this for every chapter. Just something in the vein of, "In case you haven't been following along already, since I last blogged, I've completed several chapters of Crafting Interpreters."
Thats an awesome attention to detail. It looks like a lot of work on top of writing a book. It feels like a great debt is owed for the amount of work that's gone into something I can essentially enjoy for free.
I did make a little video showing how the illustration is done, but in general, no. I barely have enough time to write the book, and video production (at least for my perfectionist self) is very time-consuming.
One thing that popped into my mind recently, as I've been going back and forth between implementing my own stack machine interpreter and forking the Python VM (for the Oil shell).
What other books cover bytecode interpreters? (In other words, the part of the book that you are 2 chapters into.)
I can't think of any! Many people learn about the front end in a "compilers" class. The Dragon Book and "Modern Compiler Implementation in X" are two of the most popular books, but they don't cover interpreters at all.
The other branch people learn about is Lisp interpreters, which are often tree intepreters (as in the first section of your book). And when they are bytecode interpreters, I think they have a compilation process that's fairly different from say Python/Lua/JavaScript. For example the "nanopass" style is somewhat popular for teaching, and pretty different from what I understand. The source language is further from the machine model.
It seems like the best sources are the Lua papers you mentioned, but I would call those primary sources.
I can't think of any "secondary sources". Terrence Parr's Language Implementation Pattern does part of a bytecode interpreter, but it's in Java, and it leaves out the bytecode compiler as far as I remember.
So somewhat surprisingly I think everybody who implements bytecode VMs these days just looks at Lua and Python?
I guess Lua was influenced heavily by Scheme, and I don't know where Python got its implementation style, maybe from the ABC precursor.
Anyway I'd be interested in hearing about other sources (primary and secondary), but it dawned on me that this is the ONLY full bytecode interpreter I've seen in a textbook. And I have a whole bunch of books and have been doing this for a few years.
So if that's the case, then kudos for filling this gap! :)
EDIT: A primary source that has been useful for me is The ZINC experiment: an economical implementation of the ML language, which is a description of the predecessor to OCaml (both written by Xavier Leroy). Although OCaml is statically typed and functional, the bytecode interpreter itself has some similarities to Python/Lua. They mention keeping your top-of-stack register in a CPU register, the relationship between runtime values and garbage collection, etc.
The book "Real World OCaml" reviews a little of this material.
Virtual Machines: Versatile Platforms for Systems and Processes covers both virtual machines for implementing programming languages and virtual machines for emulating cpu architectures. One of my favorite optimization techniques explained in this book is pre-decoding:
After loading the program's instructions, every opcode is replaced by the address of the corresponding routine implementing the instruction. For example,
that is, "system" VMs vs. "process" VMs. I don't really like those terms, but the distinction is a good one. I think the author must have been at the VM summer school I linked there.
-----
BTW this 2015 paper is saying that the labeled gotos technique isn't a big deal on Haswell:
Branch prediction and the performance of interpreters -- Don't trust folklore
In other words the branch predictors improved enough on Intel such that it's not a big optimization. But I don't like how they didn't mention other CPU architectures. Intel is not all we care about!
Also, it would be nice to have an update on this 2015 paper to account for 2017-2018 Spectre/Meltdown mitigations...
The first part of the paper compares token threading vs naive switch in cpython 3.2 which has a compile flag to disable token threading[1]. I were referring to direct threading but the difference is probably minor or irrelevant.
Nehalem shows a few outstanding speedups (in the 30 %–
40 % range), as well as Sandy Bridge to a lesser extent,
but the average speedups (geomean of individual speedups)
for Nehalem, Sandy Bridge, and Haswell are respectively
10.1 %, 4.2 %, and 2.8 % with a few outstanding values for
each microarchitecture. The benefits of threaded code decreases
with each new generation of microarchitecture.
but looking at the graphic, it seems a few benchmarks still had a +5% speed boost on Haswell.
It would also have been interesting if the paper provided the generated code by gcc for the switch statement.
If the switch statement's cases are linear (0, 1, 2, 3, ...) and if there are more than 4 cases, then gcc
token threads the code.[2]
Thanks for mentioning this book, it sounds amazing. If I hadn't just bought Lisp in Small Pieces I'd probably be looking at picking this up, but I need to work through that book first. However this is going on my future buy list.
So this is a kind of decoding of token threaded code to subroutine threaded code? How does the actual argument passing work, i.e., how does the routine starting at 0x4bcf00 know where to look for its operands?
> Alas, the fastest [dispatching] solutions require either non-standard extensions to C, or hand-written assembly code.
I assume gcc's label addresses[0].
I've been planning to try those out the next time I get around to playing with minischeme to see just how much of a speed difference they make -- I'd imagine quite a bit since it could just hop around from opcode to opcode instead of having to go through the loop (which I believe is outside the dispatching function).
It's funny, searched the google to see if clang supported this and the top result (from stackoverflow) has the comment "This is terrible coding style. Why don't you use a switch or a function table. If you really want such code, you should goto 70ies/80ies BASIC."
Yes, that's definitely the main one I had in mind.
> I've been planning to try those out the next time I get around to playing with minischeme to see just how much of a speed difference they make
I added support for computed gotos to my bytecode interpreter for Wren and, if I recall, it gave about a 10% speed boost on my (mostly micro-) benchmarks. YMMV.
The book is written in Markdown. The code samples are stored in separate files so that I can compile and test them. The code has little comment markers identifying where each chunk of code goes in the book, and the build script weaves it all together.
I'm up to the chapter in part 1 on Classes and it has been a great experience for reminding me how compilers work. Personally I recommend intentionally doing it in a language other than Java just to force the extra step of transliterating in your head, at least for me it has made the learning process much stronger.
Yeah, I totally agree with that suggestion. Even handing typing in the code samples yourself instead of just reading them can help you get out of the "skimming" mindset and into really internalizing each step.
If you're curious people have ported the code in the book to a lot of languages. Some of them are here:
I am doing the same in Rust (a new language for me) and OCaml (known a little bit). It forces you to find different solutions and you learn new languages at the same time.
I occasionally watch streams so far it was very helpful to observe how he makes design decisions, thought process and implementations.