Let’s Build a Compiler (1995)

sigil · on Oct 30, 2013

If you're looking for a modern compilers class -- including the theory of why this stuff works -- I highly recommend Matt Might's [0]. All of the notes, slides, and code are online.

I audited Might's "Compilers" this spring. He live-coded a parser that parsed with derivatives, returning all possible parse trees whenever there were ambiguities in the grammar. [1] (Try getting that from yacc, or basically any other tool in existence right now.)

All of his coding was done in Racket scheme. At the beginning he told us we could use whatever for the projects, but doing certain things in C++ / Java / Python / some imperative language was "like bringing a knife to a gun-fight."

The final project was a working Python -> C translator.

Really badass class.

[0] http://matt.might.net/teaching/compilers/spring-2013/

[1] http://matt.might.net/articles/parsing-with-derivatives/

haberman · on Oct 30, 2013

> returning all possible parse trees whenever there were ambiguities in the grammar. [1] (Try getting that from yacc, or basically any other tool in existence right now.)

Any GLR-based tool can do that trivially, including recent versions of Bison (derived from yacc): http://www.gnu.org/software/bison/manual/html_node/GLR-Parse...

Ultimately I don't think this is the best way to develop parsers, particularly when you are designing the language itself (as opposed to writing a grammar for an existing language), because it gives you no hint that ambiguity exists until you actually encounter an ambiguous string (since the question of whether a grammar is ambiguous is undecidable).

I wrote more about this on my blog: http://blog.reverberate.org/2013/09/ll-and-lr-in-context-why...

thesz · on Oct 31, 2013

Again, this parsing is trivial if you use parser combinators of (almost) any kind.

http://en.wikipedia.org/wiki/Parser_combinator

http://www.cs.nott.ac.uk/~gmh/monparsing.pdf

Actually, when using such approach you have to fight the power of resulting parser. You have to restrict it or your parser will retain data for too long.

gosu · on Oct 31, 2013

CMU's got a fairly nice one. It guides students as they build a series of compilers for increasingly large subsets of C, effectively from scratch. Course materials are public, I believe.

http://www.cs.cmu.edu/afs/cs/project/fox/mosaic/people/fp/co...

panzi · on Oct 31, 2013

> a working Python -> C translator

Including generator functions? That would be impressive. Well, one could solve it by putting all locals into an heap allocated object and use the "switch over the whole function body"-trick to continue execution at the correct position.

noobSemanticist · on Oct 30, 2013

only a small portion of a full compiler involves translators, though.

snoonan · on Oct 30, 2013

I went through this on my own back in a HS programming class! Glad to see it here.

While the teacher was walking students through how to do loops, I got permission to hack away in the back of the room on this. I ended up building a BASIC-like interpreter with a decent graphics API. By the end of the class, my project was a multi-level breakout game I'd written in the interpreter I'd written. TLDR; two years later, I talked to a girl.

simcop2387 · on Oct 30, 2013

You're lucky, I was doing the same things in middle school. Took me until college for that last part... :)

That same game project I started then still has been going on since then, multiple compilers and VMs written to do parts of the game. It's ended up an incredibly complex monstrosity with procedurally generated worlds and randomly altered scripts (some actions just won't be available on some play-throughs).

xerophtye · on Oct 31, 2013

Is the game (or its code) online somewhere? :P

simcop2387 · on Oct 31, 2013

Not yet, there's been many incarnation, but none of them ever reached a playable state. I've been on a kick to restart it lately after starting to learn prolog. I've been playing with some pre-made engines to find out if any of them can accelerate the whole thing a bit. Checking out Unity right now.

angersock · on Oct 30, 2013

Hah, well done on both counts!

Bradenski · on Oct 30, 2013

How long did it take you to get through it?

snoonan · on Oct 30, 2013

Well, it was about 20 years ago and I was a self taught programmer pre-Internet days. I actually downloaded it from a BBS and had no outside references to work with.

I spent probably 30 minutes 2 or 3 times a week for a couple of months. Most of that was probably adding features to my interpreter and coding the game itself. I recall it being extremely clear to me, even without any sort of formal CS. Even things like working with the stack and recursion were clear to me at the time.

meifun · on Oct 30, 2013

congrats. This has been on my list to read and experiment with. I was hoping my 15-year old could get into this as well.

pekk · on Oct 31, 2013

No pressure though ;)

groovy2shoes · on Oct 30, 2013

This series is one of the best introductions to compiler construction. It doesn't cover everything and it's 25 years old now, but it is the only guide I know of that will hold your hand as you build a working compiler from scratch.

If you have never built a compiler before, I cannot think of a better place to start.

Afterward, if you're curious about theory and advanced topics, I recommend heading to Compilers: Principles, Techniques, and Tools by Aho, Sethi, and Ullman (which covers a lot of theory associated with front-ends) then proceeding to Modern Compiler Construction in ML by Appel (which covers some more advanced topics and back-end stuff). Then you can continue reading about more specific/advanced topics if you like.

frigg · on Oct 30, 2013

There is a class about compilers on coursera [1] which should be pretty good and more up to date.

[1] https://www.coursera.org/course/compilers

dochtman · on Oct 30, 2013

Seconded. My experience:

Taking this course was an great way to learn more about compilers and fill a hole in my CS curriculum. Professor Alex Aiken is a great instructor and covers a good amount of material. I learned a lot about compiler construction despite having toyed with my own compiler before starting the course. The programming assignments were particularly tough, giving me useful experience in building compilers and a great sense of achievement.

(TL;DR from my full blog post: http://dirkjan.ochtman.nl/writing/2012/07/21/compilers-on-co...)

groovy2shoes · on Oct 30, 2013

The class is good, but very time-consuming and spends a lot of time on theory. Expect to spend at least 10 hours a week between lectures, quizzes, and the project.

segmondy · on Oct 30, 2013

This is more specific to C, but can still be applied to other areas. I always thought it was a great read. A Retargetable C Compiler: Design and Implementation

https://sites.google.com/site/lccretargetablecompiler/

fusiongyro · on Oct 30, 2013

A port of it to Python would probably be a better place to start. Not many people are conversant in Pascal these days.

groovy2shoes · on Oct 30, 2013

It's easy enough to follow along. The snippets are simple enough that anyone who knows a procedural language won't have trouble understanding what they do, and from there it's pretty trivial to write the equivalent code in another language. You can get pretty close by writing C with a few helper functions.

FedRegister · on Oct 30, 2013

It gives me an excuse to install Free Pascal and give it a whirl though.

groovy2shoes · on Oct 31, 2013

That's the spirit!

reginaldjcooper · on Oct 30, 2013

I've seen Engineering a Compiler get recommended as well, have you read that one?

groovy2shoes · on Oct 30, 2013

I have that one on my shelf. It's decent, but it doesn't cover anything that the other two don't. Compared to Engineering a Compiler, the Aho, Sethi, and Ullman book goes into more depth on the theory, and the Appel book has a better breadth of topics. It's not a bad book by any means, but I'd recommend that if you only have the cash for one book, go for the Appel book.

SilasX · on Oct 30, 2013

I just looked at the chapters, and it's kind of funny that it gets these accolades despite being written as a 80-char-width text-only file.

Edit: I don't say that to disparage it; I actually think that's an impressive accomplishment.

derleth · on Oct 30, 2013

He's lucky he had an 80-column card! ;-)

http://en.wikipedia.org/wiki/Apple_80-Column_Text_Card

petercooper · on Oct 30, 2013

I followed this in the early 90s and had a lovely time. It helped that Turbo Pascal was my language of choice though and might not be quite so helpful now although Pascal is a pretty good pseudocode..

It's not the same but Vidar Hokstad has been writing a series for several years now in Ruby: http://www.hokstad.com/compiler/ .. and other resources aplenty: http://stackoverflow.com/a/1672/3951

vidarh · on Oct 31, 2013

Thanks for the mention... I'll be pushing out the next part in a couple of days. And I'm just wrapping up a part on register allocation.

The really great part with the Crenshaw tutorial, is that it's so cohesive and concise. It's reminiscent of Wirths compiler construction texts, but much simpler to follow.

jorgem · on Oct 30, 2013

Me to! What a blast to the past. I might look through it again.

derleth · on Oct 30, 2013

> Pascal is a pretty good pseudocode..

Pascal is pretty much directly derived from early dialects of Algol, which is where most modern programming language syntaxes derive; it's the original block-structured syntax, as opposed to line-structured assembly and early FORTRAN and the fully-bracketed Lisp syntax (which is also block-structured if you indent it sanely).

vidarh · on Oct 31, 2013

And the reason Pascal is pretty much directly derived from Algol is that Wirth was on the Algol committee, and kept trying to get it simplified and tightened up. He then eventually went on to do Euler and Pascal (and of course later Modula and Oberon) to implement the ideas he'd had for Algol.

This is quite interesting: Pascal and its Successors - Niklaus Wirth http://www.swissdelphicenter.ch/en/niklauswirth.php

aryastark · on Oct 30, 2013

This is definitely the classic online text. However, I'm surprised no one mentioned An Incremental Approach to Compiler Construction (http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf). It is the basis of the Ikarus Scheme compiler.

Lisp in Small Pieces is also a useful book, for those interested in Lisp/Scheme. It covers much of the same stuff as in the PDF I mentioned.

pjscott · on Oct 30, 2013

I was about to mention An Incremental Approach to Compiler Construction, but you've just saved me the trouble of digging up the link. It's a mere 11 pages, but it covers a lot of ground, in small, easy-to-digest pieces, and invites you to follow along and get your hands dirty. Highly recommended.

ericbb · on Oct 31, 2013

If you like Pascal, then Compiler Construction, by Niklaus Wirth is certainly also worth a look:

http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf

Easy to read, concise, and good for beginners.

fusiongyro · on Oct 30, 2013

Email me if you'd like an HTML version of this. I have converted most of it for my own personal use (through part 11, IIRC) but don't want to distribute it publicly since I'm unclear on the copyright status.

SilasX · on Oct 30, 2013

I suggest you contact the IECC. They should be able to provide guidance on what you can share and how. It's old enough that they might allow separate hosting provided you link back to their site.

fusiongyro · on Oct 30, 2013

Thanks, will do.

scott_s · on Oct 30, 2013

Isn't contacting the author the natural thing to do? Some googling found a user profile for him: http://www.embedded.com/user/JackCrens

fusiongyro · on Oct 31, 2013

He's cool with it. Thanks for digging that up.

fusiongyro · on Oct 30, 2013

Let's find out.

jorgem · on Oct 30, 2013

Isn't this kindof "publicly"?

fusiongyro · on Oct 30, 2013

Yes, let's turn this into a semantic debate. Much more fun and productive.

dllthomas · on Oct 30, 2013

It's not a purely semantic question - "in what way is announcing that you're distributing better than visibly distributing?" - I can think of a couple.

zamalek · on Oct 31, 2013

Personally this is my favorite work on compiler theory. While dragon book is definitely more comprehensive, Let's Build a Compiler is far more approachable (so approachable that I worked through it when I was only 14).

I haven't gone to the bare metal level since (as well as using parser generators), but it's a great piece of work that gives you a slight understanding of what YACC+family do under the covers (even though they are different types of parsers). I continually recommend it as a starting point for anyone who wants to learn how to write parsers.

gozzoo · on Oct 30, 2013

I don't know if this is the same thing, but it looks like it:

pdf - http://www.penguin.cz/~radek/book/lets_build_a_compiler.pdf

html - http://www.penguin.cz/~radek/book/lets-build-a-compiler/

smoyer · on Oct 30, 2013

I remember reading this when he first published it (as well as everything he ever wrote for "Embedded Systems Programming"). Jack Crenshaw is my favorite Rocket (Computer) Scientist!

AlexanderDhoore · on Oct 30, 2013

Reading the text files, using vim, which were written in the 80s and talk about Borland and other old stuff, makes me... a bit sentimental.

segmondy · on Oct 30, 2013

Talking about sentimental, This made me giggle, "there will probably never come a time when a good assembler-language programmer can't out-program a compiler."

Reading that reminded me why I'll never make predictions on computing, especially on what can't be done.

vidarh · on Oct 31, 2013

A good assembler-language programmer these days starts with the output of a compiler. And then proceeds to beat the pants off said compiler by optimizing specific portions.

Not least because no language includes sufficient semantic information for the compiler to be able to safely optimize all the parts that the programmer can.

It's still mostly true - just not worth the effort for anything but smaller fragments.

nocman · on Oct 31, 2013

Yeah, I was going to say something similar. You'll never get a compiler to be as intelligent as a human being. At best you'll get human intelligence rapidly applied by a computer to a problem. The computer will be able to do this "manual labor" (if you want to call it that) faster than a human can. However, a human being will be able to find ways to optimize for the problem that a computer will not be able to.

I realize that many people believe that computers will some day be able to truly think on a human level. I just don't happen to be one of those people.

monkey_slap · on Oct 30, 2013

Would there happen to be a PDF or eBook format of this?

Jtsummers · on Oct 30, 2013

PDF: http://compilers.iecc.com/crenshaw/tutorfinal.pdf

At the bottom of the page.

monkey_slap · on Oct 30, 2013

Thanks!

statictype · on Oct 31, 2013

I read this in high school. At some point a light bulb went off and I finally got recursive descent parsers. Great series.