There are plenty of compiler writing tutorials for conservative, imperative programming languages with a straightforward static type system (like C).
I did study a little bit of compilers for functional languages from Simon Peyton-Jones' old book "The Implementation of Functional Programming Languages" [0]. It predates the Haskell programming language and uses a contemporary research language called Miranda as the target as well as the "host" language.
... which brings me to another topic: Monads. The SPJ book has a few chapters on implementing a Hindley-Milner -style type inference algorithm. It's written in Miranda without any Monads and uses a clever trick for coming up with unique temporary type names. There's an infinite list of integers [0..], which is "split" when traversing the syntax tree so that the left branch of the recursion tree gets the even numbers and the right branch gets the odd numbers.
The method works fine and is indeed very clever but try comparing that code to the same algorithm re-implemented (by me) with Monads (State and Error monads + transformer) [1]. The original algorithm is rather long and hard to follow (because lists of unique integers are passed to all functions).
I had a blast writing this algorithm, and I wanted to share it. It's been a few years since I last worked with it. It was my first non-trivial use of monads and I really like how it turned out in the end.
Note that SPJ's book (1987) pre-dates Wadler's popularization of monads (1992) by a good 5 years, and Moggi's idea to use monads to describe different notions of computation by 4 years^1.
Don't forget that Simon Peyton Jones 2nd book on compiling functional languages written with PJ Lester is also out of print but available from him directly on the web and includes source code from the book. Way fun, if slightly dated stuff.
Thanks for sharing, and for anyone else reading you might also like the following book--`Modern Compiler Implementation in ML' by Appel. (http://www.cs.princeton.edu/~appel/modern/ml/)
That seems like an easy mistake to make from what I remember in reading the origin of Haskell. They had issue with the licensing and cost of Miranda and struck out to build upon it. Here's the original article, "A History of Haskell: Being Lazy With Class"[0]
If this guy isn't getting headhunted, I have no idea what headhunting is. It's pretty rare for people to be able to write correct and easy-to-read guides to such low level concepts.
Very true. I'm running into that right now: humdrum legacy CRUD apps, where the only challenge lies in being allowed* to refactor the code into decent shape.
I recommend Lisp In Small Pieces (LISP) by Christian Queinnec and Compiling with Continuations by Andrew Appel if you're interested in writing a functional compiler.
Also:
As a sidenote, Stephen is also the author/maintainer of "What I Wish I Knew When I Was Learning Haskell"[0] which is a great resource both for those who are learning and for slightly advanced programmers.
No, it doesn't imply that, at all. My least favorite thing on the Internet nowadays are shallow fault-finding comments like yours from people who just want to interject themselves into discussions where they have nothing substantive to say. I used to be that guy and still have to actively resist the temptation. Don't be that guy.
Oh, man — not being that guy, at least in the fault-finding sense; just apparently failing to make an obvious joke. Very strictly, grammatically speaking, it does say that, and it was a humorous implication to me, since I have known so many programmers who apparently did stop learning once they were very slightly advanced. But I'm completely aware that the implication was accidental.
So there is a basic introduction to generating LLVM in Haskell there, but what's so great about it is that generating a lexer/parser/(interpreter|compiler) is almost magic in Haskell. The total code of that project is remarkably small. For that reason, it's a pretty standard Haskell workflow to do everything in Haskell and then have it emit LLVM/Cuda/whatever to do the heavy-lifting, and then you get to reason in Haskell and your back-end takes care of the rest. As much as that sounds complicated, it's really, really not.
Think of it as a template for this new essay. Of all of the hard optimization work occurring, most of it is embedded in LLVM. But the process of going from stringly syntax to intermediate language to back-end compilation is an important journey and doing it as Steven does using Haskell is an elegant way to scope out those steps.
I'd like to recommend The Programming Language Zoo by Andrej Bauer. It is a collection of programming language implementations, small and clear, written in OCaml.
Oh, wow. I've been learning OCaml lately, specifically to dive into Hack, Facebook's new statically-typed PHP derivative. This will really help with that, thanks so much for the link!
This looks really interesting! But I can't see any way to subscribe to book updates. A mailing list (my preferred means of subscribing to this kind of stuff these days) or an rss feed to get notified when new chapters are ready would be really handy.
Very interested in seeing how he'll deal with lexing Haskell. It's a major pain, tools like Parsec are very good out of the box with whitespace insensitivity, but not whitespace sensitive stuff.
Its not that hard to implement some form of whitespace sensitive parsing, you need to maintain explicit state, essentially a stack and a number of combinators. See for example
I tried to change the whitespace rules in the PureScript parser to match Haskell recently (to support explicit braces), and it proved to be very difficult. The "L" function which is defined in the Haskell 98 Report depends heavily on an interaction between the lexer and parser, in which a failed parse causes the indenter to behave differently in the lexer. For this reason, I stayed with the simpler rules that we have now, and I plan to support explicit braces in the parser instead.
IIRC, Haskell has a well-defined mapping of whitespace to explicit braces and semicolons -- I suppose one could perform that transform first to desugar, then parse the resulting token stream normally...
If you think of well-defined as combining the most subtle aspects of Javascript semicolons and Python braces, I agree. As hobby projects, I have coded a significant fraction of a Haskell parser and a majority of a Python parser. It was good I did the Python one first. The Haskell rules include a clause which says if you don't yet have a syntax error from your tokens, and you have a syntax error by adding the next token, and you would not have a syntax error if you added first a closing brace and then the next token, then insert the corrective closing brace. So you can't just tokenize, infer what's missing, then parse the tokens.
"I will build haskell by taking the source code and running it"
I jest, but this is an issue I've thought about idly for a while. Maybe it's possible to write a two-step parser: one that turns chars into tokens (and cheat a little bit by stripping spaces in some contexts to make it context free), and then you have a nice CFG that you can do easily.
I once took a compiler design course where the instructor said we could use whatever language we wanted (it was a tiny class - I think there were 8 of us?).
I asked whether I could use bash, and consider gcc a feature of the language.
There are plenty of compiler writing tutorials for conservative, imperative programming languages with a straightforward static type system (like C).
I did study a little bit of compilers for functional languages from Simon Peyton-Jones' old book "The Implementation of Functional Programming Languages" [0]. It predates the Haskell programming language and uses a contemporary research language called Miranda as the target as well as the "host" language.
... which brings me to another topic: Monads. The SPJ book has a few chapters on implementing a Hindley-Milner -style type inference algorithm. It's written in Miranda without any Monads and uses a clever trick for coming up with unique temporary type names. There's an infinite list of integers [0..], which is "split" when traversing the syntax tree so that the left branch of the recursion tree gets the even numbers and the right branch gets the odd numbers.
The method works fine and is indeed very clever but try comparing that code to the same algorithm re-implemented (by me) with Monads (State and Error monads + transformer) [1]. The original algorithm is rather long and hard to follow (because lists of unique integers are passed to all functions).
I had a blast writing this algorithm, and I wanted to share it. It's been a few years since I last worked with it. It was my first non-trivial use of monads and I really like how it turned out in the end.
[0] http://research.microsoft.com/en-us/um/people/simonpj/papers... [1] https://github.com/rikusalminen/funfun/blob/master/FunFun/Ty...