There are also readable code bases like tinyc or this one.
However these always use a simple template based approach to code generation and stop there.
Of course there are real world compiler (lvm, gcc, ...). However because of their size and complexity it's hard to learn from them.
What I am interested in would be going from:
"I have an AST and for this node type I output this assembly."
to
"I translate the AST to an IR. Transform an the IR in multiple passes and finally output decent asm."
That is, something that talks about CFG analysis, register scheduling or maybe SSA form? I have read papers on SSA and Sea-of-Nodes IRs, but these of course always assume that you know how and why to use them. A more approachable text/github-repo that shows how to take the step from template base code gen to more advanced techniques would be great.
(I hope it is ok to ask this here. Did not want to hijack the thread.)
After grinding through a bunch of compiler books I have to mention "Modern Compiler Implementation in C - Andrew A. Appel": http://www.amazon.co.uk/dp/0521607655 as one that stands out.
Another space you can look for interesting things (not necessarily optimising compilation) is the compilation of functional programming languages. They usually take a slightly different approach compared to imperative languages, though there's clearly a lot of overlap. I really enjoy reading in an old book called The Implementation of Functional Programming Languages from 1987, which is available online: http://research.microsoft.com/en-us/um/people/simonpj/papers...
It's a small (few thousand lines), rather neat, weakly optimising compiler for a subset of OCaml, written in OCaml. It has more than one backend, a few optimisations, a register allocator. No SSA form or 'advanced' stuff though.
Someone I know is writing a tiny llvm clone in C. It already can do integer operations faster than gcc -O1.
It is currently closed source but it shouldn't be hard for my compiler to target his IR in the future. His backend is currently around 6k lines of code so should be relatively easy to understand, though understanding some of the concepts is harder overall.
You might want to try and have a look at sbcl (steel bank common lisp) along with its documentation and various articles leading up to/documenting its current design. It's a real, and therefore complex, system - but I recall reading some rather approachable articles on it.
I don't really know about that many other real, yet simple(ish) systems. Maybe Pharo Smalltalk, luajit, pypy, gnu guile (latest version)?
There seem to be a lot of very good tutorials on writing a simple compiler, like http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf http://compilers.iecc.com/crenshaw/
There are also readable code bases like tinyc or this one.
However these always use a simple template based approach to code generation and stop there.
Of course there are real world compiler (lvm, gcc, ...). However because of their size and complexity it's hard to learn from them.
What I am interested in would be going from: "I have an AST and for this node type I output this assembly." to "I translate the AST to an IR. Transform an the IR in multiple passes and finally output decent asm."
That is, something that talks about CFG analysis, register scheduling or maybe SSA form? I have read papers on SSA and Sea-of-Nodes IRs, but these of course always assume that you know how and why to use them. A more approachable text/github-repo that shows how to take the step from template base code gen to more advanced techniques would be great.
(I hope it is ok to ask this here. Did not want to hijack the thread.)