Dated, but simple, and more importantly, concrete, using almost no abstractions (like lex and yacc), so there's no magic. Since the compiler described is for a subset of Pascal, and the compiler is itself written in Pascal, even the magic of the compiler itself is explained away.
This series is a large part of how I got started in the compiler business; and today I help maintain probably the most used Pascal compiler, in Delphi.
If anyone is interested in actually following along, TP 5.5 is freely available, though you'll need DosBox or similar if you're not running a 32-bit install of Windows:
If anyone is interested in actually following along, TP 5.5 is freely available, though you'll need DosBox or similar if you're not running a 32-bit install of Windows
Or there's Free Pascal, which is cross platform and backwards compatible with Turbo Pascal: http://freepascal.org/
Another alternative to following along might be to try and reimplement the code as you go along in another language you may know, the code is easy to follow and with very good explanations about what you're intending to do it shouldn't be much of a problem.
"A word about style and efficiency. As you will see, I tend to write programs in _VERY_ small, easily understood pieces. None of the procedures we'll be working with will be more than about 15-20 lines long. I'm a fervent devotee of the KISS (Keep It Simple, Sidney) school of software development. I try to never do something tricky or complex, when something simple will do. Inefficient? Perhaps, but you'll like the results. As Brian Kernighan has said, FIRST make it run, THEN make it run fast. If, later on, you want to go back and tighten up the code in one of our products, you'll be able to do so, since the code will be quite understandable. If you do so, however, I urge you to wait until the program is doing everything you want it to.
I also have a tendency to delay building a module until I discover that I need it. Trying to anticipate every possible future contingency can drive you crazy, and you'll generally guess wrong anyway. In this modern day of screen editors and fast compilers, I don't hesitate to change a module when I feel I need a more powerful one. Until then, I'll write only what I need."
A problem is that the closest language, in terms of abstraction stack, would probably be C, and people who know C wouldn't benefit as much from it, because C users these days tend to be older and more experienced, and because far fewer beginners start out with C. Pascal has always been more beginner friendly. Ironically, the most maligned part of Pascal historically - its type safety - has long since won the day, albeit with polymorphism that Pascal lacked.
In theory, it might be nice to do something along the lines of Ruby, Python or Javascript, but those languages are, I think, too far removed from the CPU opcodes to get the same effect. Meta-circular interpreters, like in Lisp etc., are cute and are good for explaining the fundamental semantics of the language, but there's still an awful lot of magic baked in, in the same way that a C compiler written in C can embed the meaning of '\n' as a '\n' literal constant in the source itself. Of course, that magic may be subverted, most famously in the trusting trust essay:
I wrote a metacircular compiler-compiler that generates parsers in PostScript: http://github.com/kragen/peg-bootstrap and a bootstrapping compiler for a very simple Forth-like language: http://github.com/kragen/stoneknifeforth (although I need to fix it so it will run on current kernels; apparently I cut some corner in the ELF spec...)
I don't know if it's that important as Pascal is pretty straight forward. I was able to follow along[1] and translate Pascal to Ruby on the fly even though I don't really know any Pascal. I certainly understand the resulting Pascal-ish language.
I have a feeling that most people with the will to read a tutorial on compilers, and actually make it through the first chapter, are smart enough and work hard enough to figure it out.
[1] In case anyone's interested: http://github.com/samsonjs/compiler instead of emitting assembly instructions it emits x86 machine code directly.
Is anyone else having trouble viewing that haskell link (ie 8 & firefox 3.6.10 on Windows 7)? I'd really like to have a look at it but it keeps flickering and then going to a blank page. When I hit the back button I see it for a second but then it disappears.
Excellent set of articles for someone trying to figure out bare basics of compilers without taking it in college.
For a long time (before going to college) I was fascinated with the idea of how languages are implemented. I found this tutorial and followed along. About 3 articles in, the light bulb went on over my head and I finally 'got' recursive descent parsing and how simple and elegant it was. I didn't learn much else from the articles but it was worth it for that.
I had made several attempts at learning how recursive descent parser works, but with little success. Finally I stumbled on this amazing text...
All other resources try to scare you by telling how scientifically mind-blowing subject the compiler construction is... this text simply shows how easy it can be.
This series is a large part of how I got started in the compiler business; and today I help maintain probably the most used Pascal compiler, in Delphi.
If anyone is interested in actually following along, TP 5.5 is freely available, though you'll need DosBox or similar if you're not running a 32-bit install of Windows:
http://edn.embarcadero.com/article/20803