If you're looking for a modern compilers class -- including the theory of why this stuff works -- I highly recommend Matt Might's [0]. All of the notes, slides, and code are online.
I audited Might's "Compilers" this spring. He live-coded a parser that parsed with derivatives, returning all possible parse trees whenever there were ambiguities in the grammar. [1] (Try getting that from yacc, or basically any other tool in existence right now.)
All of his coding was done in Racket scheme. At the beginning he told us we could use whatever for the projects, but doing certain things in C++ / Java / Python / some imperative language was "like bringing a knife to a gun-fight."
The final project was a working Python -> C translator.
> returning all possible parse trees whenever there were ambiguities in the grammar. [1] (Try getting that from yacc, or basically any other tool in existence right now.)
Ultimately I don't think this is the best way to develop parsers, particularly when you are designing the language itself (as opposed to writing a grammar for an existing language), because it gives you no hint that ambiguity exists until you actually encounter an ambiguous string (since the question of whether a grammar is ambiguous is undecidable).
Actually, when using such approach you have to fight the power of resulting parser. You have to restrict it or your parser will retain data for too long.
CMU's got a fairly nice one. It guides students as they build a series of compilers for increasingly large subsets of C, effectively from scratch. Course materials are public, I believe.
Including generator functions? That would be impressive. Well, one could solve it by putting all locals into an heap allocated object and use the "switch over the whole function body"-trick to continue execution at the correct position.
I went through this on my own back in a HS programming class! Glad to see it here.
While the teacher was walking students through how to do loops, I got permission to hack away in the back of the room on this. I ended up building a BASIC-like interpreter with a decent graphics API. By the end of the class, my project was a multi-level breakout game I'd written in the interpreter I'd written. TLDR; two years later, I talked to a girl.
You're lucky, I was doing the same things in middle school. Took me until college for that last part... :)
That same game project I started then still has been going on since then, multiple compilers and VMs written to do parts of the game. It's ended up an incredibly complex monstrosity with procedurally generated worlds and randomly altered scripts (some actions just won't be available on some play-throughs).
Not yet, there's been many incarnation, but none of them ever reached a playable state. I've been on a kick to restart it lately after starting to learn prolog. I've been playing with some pre-made engines to find out if any of them can accelerate the whole thing a bit. Checking out Unity right now.
Well, it was about 20 years ago and I was a self taught programmer pre-Internet days. I actually downloaded it from a BBS and had no outside references to work with.
I spent probably 30 minutes 2 or 3 times a week for a couple of months. Most of that was probably adding features to my interpreter and coding the game itself. I recall it being extremely clear to me, even without any sort of formal CS. Even things like working with the stack and recursion were clear to me at the time.
This series is one of the best introductions to compiler construction. It doesn't cover everything and it's 25 years old now, but it is the only guide I know of that will hold your hand as you build a working compiler from scratch.
If you have never built a compiler before, I cannot think of a better place to start.
Afterward, if you're curious about theory and advanced topics, I recommend heading to Compilers: Principles, Techniques, and Tools by Aho, Sethi, and Ullman (which covers a lot of theory associated with front-ends) then proceeding to Modern Compiler Construction in ML by Appel (which covers some more advanced topics and back-end stuff). Then you can continue reading about more specific/advanced topics if you like.
Taking this course was an great way to learn more about compilers and fill a hole in my CS curriculum. Professor Alex Aiken is a great instructor and covers a good amount of material. I learned a lot about compiler construction despite having toyed with my own compiler before starting the course. The programming assignments were particularly tough, giving me useful experience in building compilers and a great sense of achievement.
The class is good, but very time-consuming and spends a lot of time on theory. Expect to spend at least 10 hours a week between lectures, quizzes, and the project.
This is more specific to C, but can still be applied to other areas. I always thought it was a great read. A Retargetable C Compiler: Design and Implementation
It's easy enough to follow along. The snippets are simple enough that anyone who knows a procedural language won't have trouble understanding what they do, and from there it's pretty trivial to write the equivalent code in another language. You can get pretty close by writing C with a few helper functions.
I have that one on my shelf. It's decent, but it doesn't cover anything that the other two don't. Compared to Engineering a Compiler, the Aho, Sethi, and Ullman book goes into more depth on the theory, and the Appel book has a better breadth of topics. It's not a bad book by any means, but I'd recommend that if you only have the cash for one book, go for the Appel book.
I followed this in the early 90s and had a lovely time. It helped that Turbo Pascal was my language of choice though and might not be quite so helpful now although Pascal is a pretty good pseudocode..
Thanks for the mention... I'll be pushing out the next part in a couple of days. And I'm just wrapping up a part on register allocation.
The really great part with the Crenshaw tutorial, is that it's so cohesive and concise. It's reminiscent of Wirths compiler construction texts, but much simpler to follow.
Pascal is pretty much directly derived from early dialects of Algol, which is where most modern programming language syntaxes derive; it's the original block-structured syntax, as opposed to line-structured assembly and early FORTRAN and the fully-bracketed Lisp syntax (which is also block-structured if you indent it sanely).
And the reason Pascal is pretty much directly derived from Algol is that Wirth was on the Algol committee, and kept trying to get it simplified and tightened up. He then eventually went on to do Euler and Pascal (and of course later Modula and Oberon) to implement the ideas he'd had for Algol.
This is definitely the classic online text. However, I'm surprised no one mentioned An Incremental Approach to Compiler Construction (http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf). It is the basis of the Ikarus Scheme compiler.
Lisp in Small Pieces is also a useful book, for those interested in Lisp/Scheme. It covers much of the same stuff as in the PDF I mentioned.
I was about to mention An Incremental Approach to Compiler Construction, but you've just saved me the trouble of digging up the link. It's a mere 11 pages, but it covers a lot of ground, in small, easy-to-digest pieces, and invites you to follow along and get your hands dirty. Highly recommended.
Email me if you'd like an HTML version of this. I have converted most of it for my own personal use (through part 11, IIRC) but don't want to distribute it publicly since I'm unclear on the copyright status.
I suggest you contact the IECC. They should be able to provide guidance on what you can share and how. It's old enough that they might allow separate hosting provided you link back to their site.
It's not a purely semantic question - "in what way is announcing that you're distributing better than visibly distributing?" - I can think of a couple.
Personally this is my favorite work on compiler theory. While dragon book is definitely more comprehensive, Let's Build a Compiler is far more approachable (so approachable that I worked through it when I was only 14).
I haven't gone to the bare metal level since (as well as using parser generators), but it's a great piece of work that gives you a slight understanding of what YACC+family do under the covers (even though they are different types of parsers). I continually recommend it as a starting point for anyone who wants to learn how to write parsers.
I remember reading this when he first published it (as well as everything he ever wrote for "Embedded Systems Programming"). Jack Crenshaw is my favorite Rocket (Computer) Scientist!
Talking about sentimental, This made me giggle, "there will
probably never come a time when a good assembler-language programmer can't out-program a compiler."
Reading that reminded me why I'll never make predictions on computing, especially on what can't be done.
A good assembler-language programmer these days starts with the output of a compiler. And then proceeds to beat the pants off said compiler by optimizing specific portions.
Not least because no language includes sufficient semantic information for the compiler to be able to safely optimize all the parts that the programmer can.
It's still mostly true - just not worth the effort for anything but smaller fragments.
Yeah, I was going to say something similar. You'll never get a compiler to be as intelligent as a human being. At best you'll get human intelligence rapidly applied by a computer to a problem. The computer will be able to do this "manual labor" (if you want to call it that) faster than a human can. However, a human being will be able to find ways to optimize for the problem that a computer will not be able to.
I realize that many people believe that computers will some day be able to truly think on a human level. I just don't happen to be one of those people.
I audited Might's "Compilers" this spring. He live-coded a parser that parsed with derivatives, returning all possible parse trees whenever there were ambiguities in the grammar. [1] (Try getting that from yacc, or basically any other tool in existence right now.)
All of his coding was done in Racket scheme. At the beginning he told us we could use whatever for the projects, but doing certain things in C++ / Java / Python / some imperative language was "like bringing a knife to a gun-fight."
The final project was a working Python -> C translator.
Really badass class.
[0] http://matt.might.net/teaching/compilers/spring-2013/
[1] http://matt.might.net/articles/parsing-with-derivatives/