Hacker News new | past | comments | ask | show | jobs | submit login
Crafting Interpreters (craftinginterpreters.com)
607 points by metadat 4 months ago | hide | past | favorite | 187 comments



Author here. Seeing all of the positive comments about my book is really warming my heart. I appreciate everyone and I'm glad so many people have enjoyed the book. I put a ton of time and love into it and it's gratifying to see it had the effect I'd hoped for.


Self-taught sweng here. (And late self-taught -- didn't really start learning programming until age 21.) Early in my first job, I was asked to design a simple query language and create an interpreter for it. I had absolutely no idea what was involved in interpreters.

I found your book online (it was less complete back then -- this was 2017), and in 3 or 4 days I had a working prototype. In another week or two, I had a working product. Thank you so much for creating this resource -- learning from it and leveraging it gave me a huge confidence boost, and was super interesting and fun. One of the very best resources I've come across. I still reference that project as one of my very favorite I've ever worked on. Best wishes


Thank you for writing the book! I'm currently almost done with the first part (doing it in Haskell because why not) and every chapter I encounter situations where my "clever hack" turns out not to be so clever in the light of later requirements. It's been great fun so far. Have you considered writing up a follow-up book doing (say) a compiler or a JIT or something for Lox?

PS: the lexical analygator is the best and has shown up (with attributions of course) in several of the internal presentations at $WORK. At one point we even had fan art of him on the whiteboard, but it had to be sacrificed when we had a particularly large diagram to put on there. :|


Yes, people have definitely asked for a follow-up on compilers and/or static types. There's a little context here:

https://news.ycombinator.com/item?id=40956138

I'm not sure if I'm the best person to write a compiler book, or, at least, not yet. I've written a lot of language front ends and interpreters, but I don't have much experience targeting native code. It's something I've wanted to do for a long time, but haven't made the time to make it happen.

Static types are also hard for the reasons the linked comment talks about.

I still think about it sometimes, but writing a book is a lot of work and I've been enjoying not writing a book for the past couple of years. :)


You could do a compromise - write a book on crafting interpreters that use Truffle to be partially evaluated into compilers.


[flagged]


It's my favorite illustration in the book too. :)


on bro


Game Programming Patterns is also excellent. There's something that feels quite 'honest' and straightforward about the style of both books. It sometimes seems like authors feel like they have to play along with the idea that it's all black magic, or maybe I'm just too stupid to understand GoF design patterns.


Hi Bob, I haven’t completed the book yet, but I wanted to note I appreciate you making it available for free online. I do my best to support authors like you (I have the book in print, PDF, and Kindle editions) and encourage others to do the same. Thank you!


Thank you for buying it!

Putting the book online is one of those things that turned out to be a real non-zero-sum proposition. It draws a lot of people into the book and gives them the ability to check it out before they buy it, which has certainly increased total sales and made me more money.

But, also, for those who can't easily afford it, it ensures they still have access to it, which is good for them.

Kind of an honor system market segmentation.


Your book, as well as 'Writing An Interpreter In Go', are two books I definitely want to read when I get more free time! I'd like to have a go at making a basic typesetting language, I have a bunch of ideas for it.


I LOVE this book. I have been dipping in and out the last few months implementing Lox in Rust.

It’s so well written, full of personality and beautifully illustrated - for me this book is a 10 and the bar I’d use to judge other programming books.


Thank you for putting the time into writing it! Your book was what really made it all click for me, and I'm currently working on my own static-type compiled language as a result. I plan on doing part 2 of your book in that language; it's going to be so horrific!


Hi, loved your previous work on languages (Magpie, etc.). Any update on the statically typed language you were working on? Is it still in progress? The syntax for dealing with Sum types was pretty ingenious.

Thanks for both the books, cheers.


Still noodling on it!

I spent a lot of time trying to figure out how I wanted to handle generics and heterogeneous data in a way that felt simple enough for me to design and implement it. I think I have something more or less figured out. It's about a 50/50 blend of Go interfaces and Rust traits.

I have a prototype implementation in progress but haven't had much time to work on it. I broke my ankle a couple of months ago and while I've had plenty of time being stuck on the couch, I just haven't had the brainpower to work on it.

I haven't given up on it, though.


Oh, get well soon! And best of luck on the project.


Would you be interested by translations? I can do it in French and Esperanto.

And by the way, thank you very much for the original version.


There are licensed translations from foreign publishers for several languages (which makes me feel like a Real Author :) ). Those contracts generally give the publisher sole access to translating in a certain language, so I have to be careful about saying yes.

I'm guessing no publisher will show up wanting to do an Esperanto translation. :)

I'd like to say "yes", but I also have to worry about sending a confusing signal to other people who might think I'm giving a blanket yes to translations.

I really do appreciate the offer, though.


Your book is legendary and I have recommended it to a good number of students/programmers-in-learning. Haven't ever made it through (no matter how good the textbook, I'm still not a textbook person...) but immense respect for the time and dedication required to make such a brilliant and engaging learning material.


That book is an absolute treasure and you should be proud.

At a personal level, this book contained so much that I had always wanted to know, from how hashmaps work, to how to compile in a single pass. And I connected with your note on getting into writing languages partly due to a feeling of being less-than programmers who could create their own language.

One thing I wanted to share: I found a small design change to Clox that eliminates an entire class of hard-to-debug GC bugs.

Basically, the change is that garbage collection never occurs during the execution of an instruction.

Instead, garbage collection is an instruction which is never emitted by the compiler. When the heap size crosses the garbage collection threshold, we schedule a garbage collection, which will occur after the current instruction is done executing, and before the next instruction executes.

To implement this, in the Self-Adjusting Heap[1] section, we set a global variable (let's call it "resume pointer") to the next instruction, similar to the return pointer in a call frame. Then we set the instruction pointer to point to a global constant which contains only the garbage collection instruction.

The way this works: let's say the current instruction allocates something, and the allocation needs to trigger a GC. To trigger the GC, we set the resume pointer to the next instruction, and the instruction pointer to the global constant garbage collection instruction (let's call this GCGCI). When the current instruction completes, the VM goes to the instruction pointer to get the next instruction, and finds it pointing to the GCGCI. It then executes the garbage collection instruction, which runs the garbage collection. The final step of the garbage collection instruction is to set the instruction pointer to the resume pointer (similar to a return pointer) so that when the VM goes to execute the next instruction it resumes the execution of the user program right where it left off.

This adds a bit of performance overhead to garbage collection, but I would guess we more than get it back from the fact that it's no longer necessary to temporarily push objects onto the stack to prevent them from getting GCed mid-instruction. And this eliminates that entire class of GC bugs; code stability and programmer time are worth something!

An alternative implementation might observe that the resume pointer is basically a return pointer, so you could potentially store this on the call stack. I didn't really think through the pros and cons of this as it seemed like just adding a global resume pointer solved the problem more simply. However, one thing I was working on is making my interpreter multithreaded, and this means that most of the global variables used by the VM get stuffed into a "Thread" struct. Keeping the size of this thread struct low is a high priority because that memory usage is a limiting factor in the number of threads you can reasonably run simultaneously, so in this case moving the resume pointer onto the call stack starts to look like a bigger benefit.

[1] https://craftinginterpreters.com/garbage-collection.html#sel...


This is a really really cool idea. Thank you for writing this up.

Yes, that sounds like it would work fine to me. The only downside to keep in mind is that you will periods of time where you need to allocate more memory and have to do that first and then only free up memory in a GC after that allocation takes place. If you're very memory constrained or you need to allocate a lot, that could potentially be a problem.

But if I'm totally honest, I'm not sure how big of a problem that is in practice. One of the really annoying things about garbage collectors is how little literature there is on real-world collection strategies. You can find plenty of papers on collection algorithms, but it's hard to find material on when a GC should run.


Yeah, "When to schedule?" is a really good question.

Have you ever contacted the authors of The Garbage Collection Handbook[1]? It seems like they intend that book to have the most breadth of knowledge about GC, and if there's anybody out there who has studied when to collect they might know who.

[1] https://gchandbook.org/index.html


I haven't. That's a good idea. I'm pretty good at tracking down resources, but really bad at actually contacting humans.


As a hobbyist freelancer with no access to resources like JSTOR, contacting the authors is often the only way I can obtain access to academic papers. It works great!

Incidentally, that book might contain information on when to GC; I haven't actually finished it yet.


That's a really neat idea! I think I'll try to work that into my own Lox derivative :)


Let me know how it goes!


Thank you for game programming patterns as well.


Read Crafting Interpreters when building Crumb (https://github.com/liam-ilan/crumb). It was indispensable, especially the sections on scope and local variables. The balance between technical implementation and conceptual insights is super helpful, especially when trying to go off of the book’s set path.

It’s inspiring to see technical writing done like this. As an aspiring engineer, this sets a really high standard to aim for - excellent resource.


> when building Crumb (https://github.com/liam-ilan/crumb)

> As an aspiring engineer

I’ve got some good news for you.


> Aspiring engineer

You already made it, no need to be humble. You don’t need to finish the CS to call yourself engineer :)

Great documentation and an awesome project. Good job.


This looks cool! How did you decide on what data types to include?


Personal choice, is what I'd say. You can get a lot of mileage out of implementing a dynamic language with NaN boxing[1].

It really depends on the kind of language you're trying to build an interpreter for, and what purpose it could serve. For dynamic languages, I'd say looking at the core types of Erlang is a great place to start (Integers, Symbols, Functions, etc.). For a statically typed language things get more complex, but even with just numeric types, characters, and some kind of aggregating type like a Struct you can build up more complex data structures in your language itself.

[1]: https://leonardschuetz.ch/blog/nan-boxing/


Really neat article, this is my first encounter with the concept of NaN Boxing. Thanks for sharing!


It was my first time making a language, so I built into the language whatever was needed to build something cool with Crumb.

Wanted first class functions to simplify the parse step (they can be treated like any other value), but I needed a different mechanism to invoke “native code” vs user-defined methods, so there’s two different types for that.

Needed some kind of compound data type, but I didn’t want to deal with side effects from pass by reference, so Crumb implements lists, but they are always pass by value :)

P.s. theres some pretty neat stuff build with Crumb at https://github.com/topics/crumb if anyone’s interested!


My favorite part of this book is that guides you through writing two separate interpreters for the same language.

I think it really allows you to grasp some of the more intricate and nuanced parts about building a programming language.

You can encounter all of the big ideas in the first half of the book and gain enough familiarity with them so that when you revisit them again in the second interpreter, you can actually absorb the interesting parts.

Such a phenomenal book!


Since people are talking about other compiler resources, one I have enjoyed so far though I still need to finish it, Immo Landwerth writing a compiler in c# that generates IL and also debug symbols and the like. It is older (about 5 years, so Core but like 3ish timeframe I believe) so doesn't use current c# syntax but shouldn't matter much for most of what you'd be doing other than a lot of warnings about nullability on types.

https://www.youtube.com/playlist?list=PLRAdsfhKI4OWNOSfS7EUu...


I feel like this is a book most programmers should work through at some point or another. Doing so made me really appreciate just what's going on inside a compiler / language toolkit. It's also one of the most well written technical guides I've ever followed, it really helped me internalize the concepts, and they are useful all over the place, not just in compilers.


This comment sold me. Gonna keep this as an inactive tab for the next couple months.


Just to be safe, make sure to also copy the url into a note app, or some other location where you'll never look at it again.


Redirect it to /dev/null, that way you'll never see it again, so won't feel guilty:

$ cat this >/dev/null

Bonus points for redirecting standard error to standard output:

$ cat this >/dev/null 2>&1

Now no one will hear the screams ...


Why not buy an copy and put it on your to-read shelf for the next couple of months?


Months? How about years? And then start it, step away for a year, come back and start it again, then step away for a year, come back and … ?


>really helped me internalize the concepts, and they are useful all over the place, not just in compilers.

Which are some of those places? Parsing data formats could be one, I guess.


IMO compilers make you a lot more mature in recursive algorithms and trees, and then after that much more conscious about what exactly the code you write resolves to in terms of that languages semantics. Learning how closures work(variable capture and having to traverse the scope stack to find bound variables) is also a positive.


Makes sense, thanks.


And that’s just for your first recursive descent compiler. One thing to remember is that you will also one day want extended functionality in your language and either implement C FFI in your language(straightforward or even freely done for you depending on language) and call some C library for the purpose or you have to implement the functionality somehow. So you end up writing a lot of stuff you wouldn’t otherwise.


The secondary but perhaps equally important benefit of this book is that it teaches clarity.

The text, code, structure and pacing, everything is clear and to the point.

„Crafting“ is the right word, because the book feels like it’s written by and for craftspeople.


This book should be the second or maybe third step of your journey into PL compilers.

The first step is to write an interpreter yourself, for a simple language you create, without knowing anything about interpreters or language design. The second step is to rewrite it, and make less mistakes! :)

If you don't do this, you are never going to appreciate the nuances of this topic. And you are going to skip over concepts that don't seem important.


For me, this book demystified these topics and allowed me to do your second and third step in the first place :)

It's okay to not come up with/reinvent from first principles every technique on your own. It's normal to stand on the shoulders of giants.


I'd think of it as riding a bike vs reading about riding a bike.


When you learn to ride a bike, someone teaches you how to do it.

You could go out and fall down a few times by yourself so it’s more fulfilling when you have instruction, but I wouldn’t say that’s the most efficient path.


I’d much rather have someone, or something, providing guidance as I learned to ride a bike. Of course you’re less likely to end up with a head injury writing an interpreter, but still, it’s not exactly a crime to learn before doing.


For me, I just didn't know where to start, and thought that these sorts of projects were simply magic that were beyond my skills. But reading Crafting Interpreters showed me that all I needed was a little nudge in my thinking.

I don't think I would have ever attempted to write an interpreter without first reading something like CI... I never would have gotten to your step three.


That brings back some memories... Having just started learning Java I wanted to create my own language. It wasn't that bad, I even managed to implement an operator precedence algorithm without looking it up. It worked by scanning the list of tokens and each time finding the highest priority subexpression. There was no recursion involved, just good old manual pushing and popping on a stack.

The problem was that I didn't really understand parsing so most of the language was implemented using copious amounts of string.split and string.replace, predictably leading to some concepts not being nestable. Really wish I had archived the source code for that.


what do you mean by some concepts not being nestable?


From what I remember, variable references started with a dollar sign and you'd need to apply a new level of escaping (another dollar sign) to variable names with each inner loop, otherwise the variables would be substituted once and kept for all loop iterations. You can sometimes see similar issues in shell scripts, the following command prints three empty lines instead of numbers: sh -c "for i in 1 2 3; do echo $i; done"

There were countless other restrictions too, some expressions could only take a variable name. Statements were delimited by lines (and optionally also by semicolons) so any loop or conditional, regardless of nesting depth, had to fit inside a single line. Due to usage of string.split, initially all semicolons had to have a space before them, I remember feeling very smart when I added automatic splitting of "abc;" into two tokens "abc" and ";"


That's exactly what I did. My own experience with the first step is to write a TeX-like language with macro definition and macro replacement. About a few days into the project I kept running into so many bugs that really instilled a more academic culture in me and I started to read books. But it was still a wonderful first step: without it you might not have even realized that this is a book-worthy topic.


Mad respect to those with that kind of dedication, and everyone working to keep all the dev infrastructure going, but I'm super glad my "I want to make a language" phase was just a passing interest!

That's just a crazy amount of work!


It doesn't have to be a crazy amount of work, see Lisp-in-Lisp in the original SCIP book or a lambda calculus interpreter in Haskell (fits on a screen).


L-in-L interpreters assume that you already have symbols implemented, memory management implemented, reader and printer implemented, ...

You could spend considerably more time implementing a Lisp dialect's math library than the interpreter. There are a lot of combinations.

Where are the objects, hash tables, exception handling, ...

Of course you can get some of these things from a host language, but someone had to make them.


Yes but my point is that these problems are as hard as we want to make them. For a simple language implemented in C, maybe it's fine to never free any memory until the process quits.


More like, for a simple problem or a sufficiently small instance of a complex problem being solved in C, or a language written in C (simple one or otherwise), maybe it's fine to never free memory until the process quits.


Actually doing anything beyond weekend project complexity with that sort of thing is pretty hard though.

I guess if you have a lot of fairly small ideas with a lot of novelty, I could see the appeal of a new language to express them in.


Does anyone know of a good resoucce for creating a statically typed language with stuff like parametric polymorphism and basic type inference?


People have asked me to write something like this many times and one of the main reasons I haven't is because it's so open-ended.

With Crafting Interpreters, I felt like there was a reasonably small self-contained language I could come up with that covered almost all of the concepts I wanted to teach: variable scope, functions, closures, classes and dynamic dispatch, control flow, etc.

With type systems, there are a lot of forks in the design space and no clear "best" path:

* Does the language have subtyping or not?

* Are generics erased or reified?

* Is generic code specialized at compile time or not?

* Is type inference local or Hindley-Milner?

Any choice I make here would miss out on a lot of important material from the unchosen branch and disappoint readers who wanted me to take the other path.

Maybe the right answer is a broader survey-style book that tries to cover a bunch of different options (like Types and Programming Languages). But then you lose the fun of building one coherent thing.


Personal opinion, I'd rather you build a specific thing, with perhaps brief periods where you talk about the pros and cons of the most notable options and then why you chose what you did. As you said there's a joy in a complete system, and at least to me there would be value in discussing your set of whys because that can show people less used to system design the types of considerations involved.

I will not fault you if you don't get to any such project (I remember discovering Crafting Interpreters around the time you were like... halfway through part one? and seeing how long it took you to get through the rest of the book. The care you take to both give useful technical information and make your writing clear takes a great deal of work), but I would certainly read it as I've read much of CI if it ever exists (I need to finish going through part 2, I'm up to Closures).


How about the Zig type system?


Modern compiler implementation in ML by A. W. Appel

There are (inferior) versions of the same in C and Java. I'd use them together with the ML version.


The main crux of parametric polymorphism and type inference is just implementing Algorithm W. Just find a toy implementation online and poke at it.

For me, when I learned it myself, I simply took the toy implementation at https://github.com/wh5a/Algorithm-W-Step-By-Step/blob/master... realized that it was written in an extremely outdated style, modernized it, cleaned up, and ended up with https://gist.github.com/kccqzy/fa8a8ae12a198b41c6339e8a5c459... Then I proceeded to change various things to "break" the algorithm and see how they are broken.


It’s a bit more theoretical, but working my way (slowly, with many re-watches of certain videos) through the Coursera compilers course was a major milestone for me: I highly recommend finding an accessible way to do one of the CS things you think only Wizards can do: once you tackle one or two such projects, and realize that it’s hard, but mostly just slogging your way through it, it can be an enormous confidence boost :-)

I’m always happy to help anyone going down this path: zellyn@(most things) if anyone reading this wants to try but is feeling nervous.


One additional note: the provided code for that course is (or was a decade+ ago!) either C++ or Java, but the tests and test scaffolding are fairly straightforward.

I opted to translate into Go as I went, which meant:

- a lot more work/time

- slower progress

- no “if you fill in this missing piece, we’ve provided the rest so your compiler will work end-to-end”

- an annoying amount of trying to print ASTs exactly like the C++/Java code for validation purposes

But I got to use Go instead of those other languages! ;-)


I just finished the second half, it's a great book! I recommend doing one or two of the proposed challenges in each chapter to reinforce your understanding of the content.


Curious question from someone new to the programming field who lacks a formal CS background: How are books like this one are meant to be consumed? Do you read it cover to cover as you code along with the author in a way YouTube tutorials work?

The main reason for asking is, I don't know if I'm lacking in natural gifts (really likely) but I can't seem to retain knowledge like that. It feels nice to onboard myself to a language or framework by doing so, but afterwards I will still struggle to connect pieces together.

I'm interested to learn more on how language interpreters work, I just don't know if the format is something that will assist me. I'm trying to compliment and assist my brain with note taking after reading the note taking thread that is currently in the front page, so perhaps that will change.


This book is intended to be a cover to cover job. I tackled it about 2 or 3 chapters at a time while building alongside. But after the midway point where it swaps to a C based byte code compiler I just read it instead.

There are books like the dragon book which cover PL design in a more reference book style. But I don’t recommend them.

If you’re looking for a lighter alternative then “writing an Interpreter in Go” is worth a look.

Also Bob Nystrum has some other good material on his blog, and a chapter in game programming patterns about PL stuff.


> Also Bob Nystrum has some other good material on his blog, and a chapter in game programming patterns about PL stuff.

Links:

https://journal.stuffwithstuff.com/

https://gameprogrammingpatterns.com/bytecode.html


For this book, that is absolutely what you should do. It starts you off building a simple parser and interpreter, and then has you add features to both as you progress through the book. Skipping sections can mean that you end up missing functionality you might need.


Well, in the instance of this book, you should follow along with the steps that Bob Nystrom writes for you. From reading it, it’s clear you’re supposed to follow along and write code “with him”. He even outlines exactly where each line of code should go, and explains why along the way.


This book, unlike the infamous “Dragon Book” for compilers, can be read cover to cover. The Dragon Book has great detail but it is a slog, especially in early chapters. So many students drop their first course on compilers every year because of the Dragon Book and not the actual concepts


Im my experience following - as in actually doing the steps - works wonders.


Start at 1st chapter, try all the codes until you understand the concept. Then go to next chapter and repeat the process.


The author is one of the lead developers for Dart, which has evolved over time into a pretty nice language.


I should clarify that I've been an engineer on the Dart team for a long time, but I wasn't one of the original designers of the language. (If I had been, the language would be pretty different.)

I am on the language team now, but I'm just one of the team members and not a lead. The Dart team is really fantastic. Every day I'm grateful I get to work with such a good group of people.


Thanks to you and the rest of the team for all your great work! It really is a very nice language. I've been working almost exclusively in Dart for the last 6 months and it really is a very pragmatic and productive language.

I'm looking forward to macros and more fully featured record types.


How might Dart be different if you were an original designer?


It's hard to separate out what I know now that neither I nor the initial designers knew back then from what I thought we should have done even then.

For example, optional types ended up not working out, but I don't think I knew that then either.

But even before we released, I did tell the language team that I thought it would be good to do:

* Non-nullable types. We got there eventually with a ton of engineering effort and migration cost, but this could have been a really strong selling point at launch.

* Types on the right. Think `var x: int` instead of `int x`. It's a little more verbose in some cases, but it makes the language easier to parse and makes it easier to have complex syntax for type annotations like function types, union types, etc.

* Extension members. We added them years later, which is good, but was too late to affect the design of the core libraries. I suggested that it we had them in 1.0, then the core libraries and collection types could be designed around them.

* `val` instead of `final` for single-assignment variables. It's a small thing, but it means that the syntax for single-assignment variables isn't longer then mutable ones, which is good because it doesn't punish users for doing the "right" thing.

* No semicolons. Again, another small thing, but it does help the language feel cleaner and more modern. It's really hard to do this later when the syntax wasn't designed for it.

* Coroutines instead of futures and async/await. Coroutines are definitely hard to compile to JS, so there is a good argument that I'm just wrong. But I really hate what async does to API design (https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...).


That’s very close to my list. It’s only a few characters but “final” takes up too much space. And I agree types on the right are easier to read.

Other than that I’m mainly looking for the reasons code generation is required today to go away.


I think Dart is underappreciated. Almost the entire compiler is written in Dart. It has a VM, AOT compiler, FFI/Native, and first class Javascript interop with ability to compile to javascript. It has first class WASM support. Null sound safety. They have experimental support for macros. Pretty impressive.


I completely agree. I was lukewarm on Dart but decided to do my new app in Flutter and it's turned out to be a great experience. Kotlin and Swift may look better on paper but Dart is very productive. The tooling is really solid and the fast compile times and hot reload really help for rapid iteration. I've also had no performance issues.


Alright, I will finally read the book. It has been collecting dust in my bookshelf for a while.


I finished this book, wrote the two implementations in Python and Zig, genuinely one of the best set of projects I've ever built.


Hmmm, I started the book once in powershell. I think, I finished chapter 2. Maybe I should finish the whole book with it finally. Just for the fun.

(Yes, powershell. Because I didn't have any other toolchain/compiler on the windows machine at some old work place and no admin rights to install anything. So I learned powershell.)


How was the Zig experience? I am looking for an intermediate Zig project to practice Zig. Does this require advanced features of Zig?


I’ve also ported the lox vm to Zig and had a great time working through it.

Since the project is designed in C, you can mostly write the exact same code in zig, with minimal modifications. If you want to use zig features, they’re easy to integrate, but Nystrom obviously won’t be giving you any hints.

But the language offers a lot of useful features (slices, optionals, error types) and makes some C paradigms syntactic realities (tagged enums, explicit pointer casts). Even more so, the standard library comes with very useful stuff (an ArrayList, a handful of different allocators, heck I replaced the trie of keywords with a StaticStringMap).

it’s a fun project, I would definitely recommend it!


I’m also implementing it in Zig right now (and haven’t done any project other than small snippets in zig before) and it’s fine.

You can actually appreciate how much nicer it is to write than in C, while still being a fairly 1-1 translation.


What bothered me a little bit while following the book was that copying the code doesn’t result in compilable code all the time because there is code missing that is only introduced later. I get why the author chose to follow this path, but I’m from the club that every commit should compile and was annoyed a bit by that.


It's hard to have the code compile after literally every single snippet. Sometimes, we need to change the signature of a function that already exists and is called, so there's going to be at least two snippets you'll have to apply before you're back to a working state.

It probably would have been possible to make this work by introducing temporary scaffolding code which gets removed shortly after, but it would have made the book much longer and more tedious.

Balancing thoroughness and brevity is always a challenge when writing.

The code is compilable at the end of every chapter and usually compilable at the end of every heading within a chapter. One thing I could have done that I didn't would be to place some visual markers in the book whenever you reach a point where things should be compilable. I'll keep that in mind if I write a third book.


That had better be "when I write a third book" Bob, not "if". It's not just the technical content (Which is excellent), but your ability to convey and entertain that helped create such a classic book.

So if you do get a spare couple of thousand free hours, please ignore the pull of family, the economic safety of work and plunge head first into another oversized computer engineering project. You may now take my money.


I am the opposite. I think pedagogical materials like this should introduce things in the order that makes intuitive sense, and most of the time that means from the easiest to the hardest.


What do you think the author's intention was behind doing it this way?


Doing it the other way is a common pitfall in teaching programming concepts. A very common example is the historically necessary boilerplate in Java’s Hello World, where quite a few concepts are present but handwaved away as you don’t need to worry about this now. The problem with this is twofold:

1. It is challenging to distinguish the pertinent from the impertinent. People who are just starting won’t be able to tell what parts of the code deserve their attention now.

2. It is challenging to retroactively draw attention to previously dismissed details once they do become pertinent.

I think there are other ways to address this, with additional formatting considerations in the presentation of example code. But there’s still a problem once the reader wants to tinker with the code, as any such formatting will be lost between a carefully tailored example rendering and the user’s code editor.


> Doing it the other way is a common pitfall in teaching programming concepts.

100% disagree

If someone is brand new to programming and you're expecting to teach them the following concepts:

A class, methods, static methods, data types, method return types, void return type, arrays, and namespaces just to be able to write a simple "Hello World" app [1] I think your approach is the one that's misguided.

The most common mistaken people make in teching is teaching things in the order the were discovered historically. Not always, but very often that is a distraction. The second most common is teaching from the outside in.

The best way to teach is to explain what your tyring to accomplish and start with a very simple example for people to play with. Then pick on concept and modify it so they can see the results of those changes. Then pick the next concet and modify that so they can see tangibly what that means.

In the hello world example, it would be starting with the program in [1] and focusing just on the constant string "Hello World!". Change it see what that means, understand what's going on. Then move out to the Console.WriteLine. Make a copy of that line have two lines of it then 3 to understand the method call. Then swap to Console.ReadLine to see what a different method call can do. Continue exploring out from that core concept.

The "static" identifier on a method class is not where to start teaching someone programming.

It's also why Racket removed a lot of the boiler plate from thier core teaching langauges for students. For them to not even have to see the things that are unnecessary at that point in their learning.[2]

[1] https://www.geeksforgeeks.org/java-hello-world-program/ [2] https://docs.racket-lang.org/drracket/htdp-langs.html


You say 100% disagree, but then everything you say seems to agree with my points. I would even go so far as to endorse your response as a longer form elaboration of exactly what I was trying to get across.


I see you claim that, but unless I misunderstanding that completely contradicts our claim here:

> A very common example is the historically necessary boilerplate in Java’s Hello World, where quite a few concepts are present but handwaved away as you don’t need to worry about this now.

What I'm saying is exactly that. Handwave past all of the boilerplate like the static declaration and only focus on the most important to learn up front (ex what a string is).

Could you explain how your statement of initially teching the boilerplate is the same as my statement of ignoring it?

In my example, the entire time they are working with strings or readline/writeline method calls they are entirely ignoring what a static/class/args all mean and they are just boiler plate they have to type until they get to that stage in later lessons.


> Could you explain how your statement of initially teching the boilerplate is the same as my statement of ignoring it?

I specifically said that including the boilerplate is a common pitfall, and requires teaching around a bunch of impertinent concepts.

I also chose the example for a reason very close to your example with Racket: later versions of Java have similarly aimed to reduce the necessary boilerplate. And if I’m not mistaken, the learning experience is a motivating factor for that as well.

If emphatically agreeing with you in so many words isn’t enough to convince you that I agree with you, maybe you’re just looking for something to argue about?


> I specifically said that including the boilerplate is a common pitfall, and requires teaching around a bunch of impertinent concepts.

And again, I'm saying I disagree with this. You are saying don't include the boilerplate at all when teaching. I a saying, do include the boilerplate and don't teach it until students are ready to understand it.

As a separate topic, if the language doesn't need boilerplate to make the program work (like Racket) that's spectacular. But if it does, then I'm explicitly saying "show the boilerplate and explain 'you don't need to focus on it now. You will learn it later'".

It is best give people a fully working program at all times, even if they don't understand all of it. So they can play around with it, rather than just read along while the author writes about what would have happened if it were fully functioning code they could play with.

And to make it very clear, the example I linked to explicitly includes all of the boilerplate, and I explained exactly the best method to teach a student with that specific concrete source code.

> https://www.geeksforgeeks.org/java-hello-world-program/

Now all of that said - this isn't really important enough to continue arguing on the internet over. I think we both had good intentions but just weren't communicating clearly. I hope you have a great day.


I understand you better now. I was confused by your Racket example, as it represents exactly the same direction the Java example has gone.

As you say, definitely not worth arguing further. But I appreciate your clarification!


Ha, neat! I’m building a dynamic programming language in Rust called atom (https://github.com/dmeijboom/atom). It’s an interpreted language with a bytecode compiler. Will definitely check it out.


I cherish this book, its fantastic. I love all of the tidbits about PL history in the margins, and its a great guide through the world of language design.

If you're looking for a follow up, I might have a recommendation. I recently got Essentials Of Compilation by Jeremy G. Siek, and I'm very excited to find some time to read it. There is a version implemented in python and racket (two separate books), so you can pick what you're more comfortable with. (I chose python). Its very elegantly written and after each chapter you end up with a working compiler.


I just checked it out.

apart from the book which you can buy, there is an open access edition, linked from the MIT press page, thanks to funding by Google, they say:

https://github.com/IUCompilerCourse/Essentials-of-Compilatio... .


codecrafters (YC S22, http://codecrafters.io) has a new module called "Building Your Own Interpreter" that works its way through this book. It's great because you work in small chunks, each of which has unit tests to tell you if you got, e.g. scanning string literals correct.

Currently free while the module is in beta. I spent today working through all of the lexer exercises in Rust (I'm currently learning it), and had a lot of fun.


Planning to read this soon. Anyone got any other compiler book recommendations? Preferably modern (I've heard the dragon book is out of date)


Engineering a Compiler (Cooper and Torczon) seems to be widely recommended. I’ve got the second edition but there’s now a third - not sure how significant the changes are.

AFAIK the problem with the dragon book is the same as with most academic compiler courses. It spends most of its time on the theory-heavy front-end (parsing) and much less on the back-end (code generation). Real-world compilers are very much the opposite.


I think Engineering a Compiler is a great next step after Crafting Interpreters. It's both easier to get into (in terms of writing style and structure) and, as you say, generally more practical than the Dragon Book.


Do you have any thoughts about how to read some of the next step books like Engineering a Compiler/Dragon Book? I don't normally read large technical books end to end so I'm curious how others approach it. I am going through Crafting Interpreters right now and I like how it has well defined checkpoints that help me know that I can move on when I understand the current material. I skimmed Engineering a Compiler and it looks like it has exercises as well, do you recommend all/some of those, or any other methods?

Also I did some searching to see past discussions of Engineering a Compiler and found this interesting comment thread from Bob Nystrom for anyone interested: https://news.ycombinator.com/item?id=21725581


I think everyone has to pick a strategy for consuming material that works best for their brain. For me, when I read textbooks, I tend to read them front to back. Only a fraction of it sticks on the first read through, but I accept that. I've tried to go really slow and do all of the exercises to make it all stick, but I usually just run out of steam and give up. I'd rather just get through the whole thing.

Then, in the future, when I find myself needing some concept from the book, I usually remember at least that it is in the book. Then I go back and read that part more carefully. Now that it's relevant to a real problem I have, I tend to remember it much better after the second read through.


Thanks for the perspective, I sometimes have completionist tendencies so it's hard to skim without feeling I'm missing parts, but also it can certainly lead to motivation running out at times. And thanks for Crafting Interpreters, I am really enjoying it so far!


https://mitpress.mit.edu/9780262047760/essentials-of-compila...

Recent, that is the Racket version and the author has a Python version, too. Based on the nanopass compiler idea. It builds out the compilers over the course of the books.


I’ve not read this book, so can’t offer a comparison but Thorsten Ball’s two books [1] are great.

The dragon book was my textbook in college, brings back memories, might be outdated but some concepts should still be useful.

[1] https://thorstenball.com/books/


Thorsten Ball's books are the closest thing to Crafting Interpreters. Not just by topic, but also with how the code is presented. By far these are among the few books I know that take the effort to guide the reader to programming something complex from scratch, while also being testable as the program grows. This means that previously presented code changes along the way as new features are added.

A lot of other "code yourself" books, on the other hand, simply slice already finished codebases, and there's no way to test a simpler incomplete version of the program unless the reader makes extra effort and go beyond what is presented in the book.

While there is a lot of overlapping topics in Nystrom's and Ball's books, there are also enough differences to learn something new from the other. Ball's books uses the same parser and AST as front ends to both tree-walking and VM interpreter. CI, on the other hand, skips the AST for the VM interpreter, and also teaches extra topics like implementing garbage collection, dictionaries/hashtables and class-based OOP.


"Writing An Interpreter In Go"[1] is also pretty good. I read it after finishing crafting interpreters.

[1] https://interpreterbook.com/


Does it provide information or techniques that aren't already covered by Crafting Interpreters, or was it more of a refresher?


For someone who has sporadically worked through large parts of the dragon book over time (I'm not sure the dragon book is appropriate to consume from cover to cover), and written a couple of tiny interpreters and compilers for fun -- will committing the time for Crafting Interpreters book be revelatory or more of the same? I've long been curious.


I love this book, it's an excellent and refreshingly readable complement to more academic texts. I recommend pairing with Friedman's "Essentials of Programming Language". While I also love the Friedman, it's nice to read some PLT without feeling like I got punched in the face sometimes... ;-)


Finishing the C interpreter gave me enough confidence to begin my own, from-scratch, interpreter project. Thank you, Bob.



How easy is it to follow along and yet build a statically typed language (instead of dynamically typed one as used in the book)?


Depends on your other goals for your language.

If it's a statically typed language along the lines of TypeScript (where the static typing is just for safety, not performance) and you don't have any opinions about how it runs (compiled vs interpreted) then you can totally follow the book and then strap your type checker on later.

If you have a compilation target in mind (llvm, wasm, jvm luajit), the first half of the book won't get you much closer to it (you'd have a parser) and the second half would have to be heavily adapted to output your target format instead of the highly specialized custom bytecode the book uses.

And in neither case does the book directly help you with using type information to improve efficiency, so without adaptation you'd end up with a language that has static typing for safety but still does type checking at runtime. That's not a bad problem to have, you can always come back and strip away checks later as you get more confident.


Waiting on the shelf for me once I finish cs50


I would wish, there's a german version which i could buy as hardcopy :D


I really wish this book used something other than Java. Nothing against Java - just that I don't know it and don't feel excited about learning it.


For what it's worth, these days you could use an LLM to turn each code snippet into many other languages.

Or just focus on his words and use the snippets like generic code and figure out how to do it in whatever language of choice.

You don't need to "learn java" to become conversant or read it as pseudocode.


That's great advice. Thank you!


The Java it uses is basically understandable regardless of what language you're used to.

I guess if you're extremely early in your career and have never written anything but Python, it might not be, but if you've ever touched any statically-typed language and a language with a somewhat C-like syntax, you should be fine.


Lots of people have done implementations in other languages: https://github.com/munificent/craftinginterpreters/wiki/Lox-...

I did the first half in Clojure (in order to teach myself Clojure), worked just fine. I had to do a bit of translation but it's really not a lot.


I find it’s the best way to work through these types of books - otherwise I tend to just copy the code mindlessly. Doing the exercises from a book in another language forces you to consume the semantics of it better.

I did the Torsten Ball books in Kotlin rather than Go and learned a lot


I’m glad it used Java. I wound up rewriting the code in Python and it really helped me understand the concepts a lot better.


What other languages do you know? If you know something with similar enough syntax (C++, C#, Javascript, etc.), even if the semantics aren't the same, you should be ok getting the basic idea of the Java code in the book, especially considering how he explains things as you go.

Then just write it in whatever language you want. When I went through the first half of the book back in 2017, I wrote it in Scala. (I did know Java, too, though, so certainly I had an easier time understanding the code examples in the book.)


I started reading the book after reading similar suggestions in the thread.

I know JavaScript and a bit of C and I'm not having much trouble understanding the Java code.


Don't use Java then. The Java bits are just an example, you can write it in any language you want.


Java is pretty readable, as long as your familiar with C-style languages you shouldn't have too many problems rewriting this in another language.


When Java gets pattern matching, it will become a more reasonable choice to write an interpreter in.


+1. When I was following along in the book I wrote the interpeter in C#, and pattern matching allowed me to basically skip needing the code generator and the visitor pattern. YMMV but its great. Essentials of Compilation by Siek does a similar trick with python.


It’s had pattern matching since 16, it’s been expanded since with some syntactic sugar and more recently the introduction of sum types and exhaustiveness to go along with pattern matching.


IMO Java's pattern matching is still fairly primitive. Not even up to the level of usefulness of Scala 2.13.


Interesting - what specifically is missing? I’m thinking we’ve got all of scala’s pattern matching since JEP 406 (2021) but I don’t write Scala so there’s maybe a feature i don’t know. Would be interested to know.

    switch (thing) {
        case Integer i when i > 0 -> println(“Positive…”);
        case …
    }


I think there should be a second edition of this book then :)


It does. The second half is written in C. Java is just used for the simplified first part.


Java is incredibly readable. I don't care for writing Java either but reading the first part but implementing it in c# was incredibly easy (and also ensured no copy/pasting).


You can do the second part that is C!


I know this book has been praised before on HN, but I've been personally a bit disappointed. It's suitable for someone with no very little knowledge on the topic, but it doesn't really cover any advanced topic.


It's generally a good idea to judge a book by its explicitly expressed audience, where applicable. In this case Bob made his audience very clear in the first few paragraphs of the book [0]:

> It’s the book I wish I’d had when I first started getting into languages, and it’s the book I’ve been writing in my head for nearly a decade.

> In these pages, we will walk step-by-step through two complete interpreters for a full-featured language. I assume this is your first foray into languages, so I’ll cover each concept and line of code you need to build a complete, usable, fast language implementation.

> In order to cram two full implementations inside one book without it turning into a doorstop, this text is lighter on theory than others.

It sounds to me like you picked up the wrong book, skipped the introduction, and then were disappointed that it wasn't directed at you.

The good news is that it is free, so you aren't actually out anything but time.

[0] http://craftinginterpreters.com/introduction.html


I trust that you're correct, but I'm glad that this is the case. There are books designed to be introductory, and there are books that serve as reference for advanced topics and state-of-the-art techniques. The first type can often serve as an enabler for the second type.


I’m personally very excited for this book. When I would get programming books at the library as a youth, it would be a gamble as to “introductory or advanced?” and I needed the former way more. “3D Rendering in X Windows” wasn’t great at for me at 13, and getting into topics like this takes a solid foundation.


Because it’s meant to be for those people


Is this book still relevant?


Yes, very much so. Why wouldn't it be? Can you recommend a better resource for learning how to create a programming language from first principles?

I just started it today, which I why I felt the inclination to create this post.


It’s an incredible book. Munificent did an amazing job.


This is not a book about bleeding edge compiler techniques. It teaches the foundations that makes it easier to learn the more complex parts of writing a compiler.

Unless compiler design radically changes it will likely remain a relevant book for a very very long time. Especially since Java-adjacent languages are likely to remain in vogue and the odds of us losing C as a major language in the next few decades is basically zero.


It's a 3 year old book on compilers, so if it was ever relevant once, it surely still is today!


Definitely. I just went through the first 60% or so and it's great. The language is java but it's easy to translate to other languages, which would be a good learning exercise in its own right.


Super. Bob’s way of presenting his content is what I feel the most important to get out of. Great communicator.


CFG (context free grammar), which is explained in the book, is used here together with LLMs: https://tree-diffusion.github.io


The lex and yacc utilities are part of POSIX.2; is there any reason not to reach for them first?

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/l...

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/y...

All the POSIX.2 standards for shell utilities can be found here:

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/

The original introduction to lex and yacc was in the book by Kernigan and Pike:

https://scis.uohyd.ac.in/~apcs/itw/UNIXProgrammingEnvironmen...


For a few reasons.

One, because actually building the lexer and parser from scratch is a useful exercise in a learning context, which is what this is.

Two, because the book wants to teach Pratt parsers, rather than LALR parsers. There’s a lot of literature out there on LALR parsers and generated parsers in general, but precious little on handrolled parsers. Covering material that hasn’t been covered to hell and back is a Good Thing.

Three, because lex and yacc generate C, and the book has two implementations of the interpreter, in C and Java. You could’ve used ANTLR to generate both parsers, but lex/yacc would only cater to the C version

And finally, because not everybody is on a POSIX system. Before WSL, using anything Unix-y on Windows was miserable. These days it’s mostly fine, but using WSL means you’re not actually building windows-native stuff anymore.


Bob addresses this in the introduction^:

> We will abstain from using [parser generators] here. I want to ensure there are no dark corners where magic and confusion can hide, so we’ll write everything by hand. As you’ll see, it’s not as bad as it sounds, and it means you really will understand each line of code and how both interpreters work.

^ https://craftinginterpreters.com/introduction.html#the-code


As the author of a POSIX standard utility, I would advise you to only reach for such utilities when portability is the most important thing.

POSIX utilities are not great. Lex and Yacc included.


Is your criticism of the relationship of the lexer and the parser, or something more fundamental? Is an LR parser expressed in BNF notation unsatisfactory?

I say this only as it has been many years since I have written one, but I thought this OCaml presentation on a POSIX shell held the structure in high regard.

https://archive.fosdem.org/2018/schedule/event/code_parsing_...


The UX of the BNF notation.

I am partial to hand-coded recursive descent purely because I struggled with Lex and Yacc too much when I tried.


I commiserate, it can be unforgiving, and the use of C is admittedly out of vogue.


Oh, I love C. [1] But yes, unforgiving.

And a lot of magic, with special variables, macros, and functions that you must know. And unclear scoping.

[1]: https://gavinhoward.com/2023/02/why-i-use-c-when-i-believe-i...


OT, but:

I remember reading that post of yours a year or so ago! It was so bizarre to me, but also really fun. I feel like you are among what is probably a very small group of people who could write something security-sensitive in C, and I would feel comfortable using it. (I am still less than thrilled that I have to rely on OpenSSL, regardless of the process improvements its developer team has made in more recent years.)

I still (unfortunately, IMO) write a lot of C. Back in 2004 I thought it was great fun. But now that I've worked in safer, batteries-included languages, I find C to be a huge drag. I'm so much more productive in Rust or even Java. I could never ever ever ever approach the level of rigor you bring to your C code; I just don't find doing all that stuff to be fun or interesting... to the contrary, it feels like a huge drag: if I were "required" to do all that, I just wouldn't write any code at all in C in the first place, knowing that having to do all the testing and fuzzing and static analysis stuff would burn me out.

Anyhow, I hope I don't sound like I'm being critical... quite the opposite. I just think we live in a magnificent world where people's brains can work so so so differently when it comes to the same topic.


No offense taken; I know my brain is different. :)

> I feel like you are among what is probably a very small group of people who could write something security-sensitive in C, and I would feel comfortable using it.

This is probably the most complimentary thing anyone has ever said to me. Thank you! I hope I can live up to that.


Are POSIX utilities even portable? They tend to have poor windows support. And they also tend to target C, which has a high ceiling for portability, but also a high bar for making things portable (whereas more modern languages are often just portable by default).

My take on POSIX utilities would be only to use them on Linux platforms where they effectively form a "native" part of the platform.


The NT kernel was created with a POSIX layer (along with os/2).

https://en.m.wikipedia.org/wiki/Microsoft_POSIX_subsystem

Microsoft had previously been the largest AT&T UNIX licensee with Xenix sales on the TRS-80 model 2, so they were familiar with the need.


I was mostly speaking about POSIX, since that was the focus of GGP.

Yes, you should only use them on POSIX.

My utility does build natively on Windows, though.


> They tend to have poor windows support

Does Windows claim POSIX compatibility these days? I didn't think so? I remember NT's old POSIX subsystem, but is that even a thing anymore?

> My take on POSIX utilities would be only to use them on Linux platforms

I mean, only use them on POSIX-compliant systems. Linux is just one of them.

At the risk of sounding childishly derisive, when someone is talking about "portability", I think they're generally not bothering to include Windows in the mix, as Windows doesn't really share all that much of a common denominator with much else.


I disagree here. The syntax can be infuriating at first, but once you understand it, Lex/Yacc are rock solid and speed up development significantly.


Does a single production compiler use lex and yacc to generate any part of their system, beyond maybe a first pass to test out syntax before it gets rewritten into a hand written parser/lexer? I'm not going to say none exist since I don't know literally every production compiler ever written, but I have never heard of one that used them for the final code.


I think Ruby does? IIRC its source code includes a 10,000+ line “parse.y” file which is converted to C code using Yacc.


Excitingly, Ruby 3.3+ shipped with a new alternative parser called Prism[0]! See also: [1][2].

[0]: https://github.com/ruby/prism

[1]: https://railsatscale.com/2023-06-12-rewriting-the-ruby-parse...

[2]: https://railsatscale.com/2024-04-16-prism-in-2024/


Interesting, I'll have to look into this because I'd never heard that before and now I'm curious.

A language with as much going on as Ruby was not one I would have picked as a candidate for yacc


Just went and looked in the github repo. 16,000+ line .y file. I can't imagine that is pleasant to maintain.


Yes, OCaml with its very complex syntax and hundreds of features uses the OCaml equivalents of Lex/Yacc. It is a myth that one cannot use Lex/Yacc in production.


Python notably switched from hand written recursive descent, to a PEG based parser generator.

But indeed, last I checked, recursive descent was the most common choice overall.


The Unix Programming Environment tutorial for building hoc doesn’t even come close to what Nystrom gets into in Crafting Interpreters. Hoc is a very fun little language, but as Nystrom describes several times in the book…parsing just isn’t the interesting part of the game.


Parser generators create more problems than they solve.


Is there any reason to reach for them first?




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: