Resources for Amateur Compiler Writers

senko · on July 25, 2016

IMHO, compiler construction as an advanced excercise for amateurs is at topic that has been beaten to death (as OP suggests, there's tons of available materials and projects ranging from high quality not-so-amateur to quick or fun hacks - I'm guilty of one myself).

On the other hand, I would love to see "HTML5 and CSS parsing and rendering for amateurs". Given the state of modern HTML5 and CSS standards, and ignoring compatibility and real-world usage (just like for toy compilers), Let's Build A Browser Engine sounds more tempting than Let's Build a Compiler.

(To preempt "contribute to existing actual real-world engine" suggestions -- while that's worthwhile, it's like saying "contribute to LLVM" to someone looking to write a toy compiler, ie. completely misses the point).

stijlist · on July 25, 2016

There's this great series by mbrubeck which sounds like it might be right up your alley: https://limpet.net/mbrubeck/2014/08/08/toy-layout-engine-1.h...

senko · on July 25, 2016

This looks like a great resource, thanks!

creatio · on July 25, 2016

This also might be interestings: http://www.html5rocks.com/en/tutorials/internals/howbrowsers...

gsnedders · on July 25, 2016

> Given the state of modern HTML5 and CSS standards, and ignoring compatibility and real-world usage (just like for toy compilers), Let's Build A Browser Engine sounds more tempting than Let's Build a Compiler.

FWIW, current HTML and CSS standards should give you compatibility with real-world usage if you implement them accurately. Of course, for a toy, you may well want to simplify things from what the standard does.

nickpsecurity · on July 25, 2016

I agree. It's a topic that should die outside maybe posting that one link that aggregates many useful resources on the topic. You probably know the one.

Re HTML5/CSS

Excellent idea! These slways need more work in some way plus have widedpread impact. Amateurs stumbling on improved algorithms might similarly see their work have great impact.

kyberias · on July 25, 2016

Why should this topic die exactly? I don't understand the hostility. Do you want to keep a fun topic secret?

nickpsecurity · on July 25, 2016

Not necessarily. It just comes up a whole lot despite mostly not teaching them what they need to know to improve compilers as another commenter pointed out. So, redundant and a bit self-defeating. Other, related topics or an improved version of this one could lead to more real-world results whether hobbyists or professionals pick them up.

danbolt · on July 25, 2016

I'd love to do some project like this to give me a better understanding of webpage performance.

The high-level abstractions of HTML/CSS are really powerful, but it would be cool to have an understanding of the implications of such designs when writing web applications.

zzzcpan · on July 25, 2016

Frankly, I don't see anything interesting in that list, especially for amateurs.

As an amateur compiler writer you would probably want to make something useful in a few weeks, not waste a year playing around. And it's a very different story. It's essentially about making a meta DSL, that compiles into another language and plays well with existing libraries, tooling, the whole ecosystem, but also does something special and useful for you. So, you should learn parsing, possibly recursive descend for the code and something else for expressions, a bit about working with ASTs and that's pretty much it.

PaulHoule · on July 25, 2016

Is amateur the right word?

I am in it for the money which I guess makes me a pro but I don't have a computer science background and frankly in 2016 I am afraid the average undergrad compiler course is part of the problem as much as the solution.

Another big issue is nontraditional compilers of many kinds such as js accelerators and things that compile to JavaScript, domain specific languages, data flow systems, etc. Frankly I want to generate Java source or JVM byte code and could care less for real machine code.

reymus · on July 25, 2016

"frankly in 2016 I am afraid the average undergrad compiler course is part of the problem as much as the solution."

What do you mean by that?

tikhonj · on July 25, 2016

I'm not the OP, but I sympathize. The specific details covered in a "classical" compilers course are heavy weight and not super-relevant right now. These days you don't have to understand LR parsing or touch a parser-generator, you don't have to worry about register coloring... etc. Courses still use the Dragon Book which is older than I am and covers a bunch of stuff only relevant to writing compilers for C on resource-constrained systems.

Instead, I figure a course should cover basics of DSL design, types and type inference, working with ASTs, some static analysis and a few other things. That has some overlap with a traditional compilers course, but a pretty different focus.

metaobject · on July 25, 2016

In 2006 a second edition of the dragon book was released: https://www.amazon.com/Compilers-Principles-Techniques-Tools...

Drup · on July 25, 2016

So, the TAPL ? :)

https://www.cis.upenn.edu/~bcpierce/tapl/

catnaroek · on July 25, 2016

Not really. TAPL is a very useful book, but it won't teach you how to write a compiler, unless the only part of a compiler you actually care about is the type checker. The interpreters it describes (in the chapters titled “An ML implementation of <whatever>”) are ridiculously inefficient.

Drup · on July 25, 2016

A good amount of new toy-ish languages compile to another language (typically javascript), and introduce new semantics and new type rules. As parent said, small DSLs.

You don't really need more than a typechecker and ast tranformations for that.

nickpsecurity · on July 25, 2016

You have a link to a good guide for beginners on designing and efficiently implementing type checkers?

catnaroek · on July 25, 2016

This is a nice introductory tutorial on how to implement Hindley-Milner type inference: https://github.com/jozefg/hm

This is a more advanced tutorial that illustrates a nice but tricky optimization that OCaml's type checker internally uses: http://okmij.org/ftp/ML/generalization.html

Finally, TAPL's type checkers are pretty good. They aren't designed for efficiency, though. They're designed to closely follow the book's contents: http://www.cis.upenn.edu/~bcpierce/tapl/checkers/

nickpsecurity · on July 26, 2016

Thanks for the links!

PaulHoule · on July 25, 2016

That's better than average if you are getting that as an undergrad!

zamalek · on July 25, 2016

I mostly agree with you; however, SSA seems like overkill up until the point where your code becomes a tangled cyclomatic mess because of the lack of it (example[1]). I'd definitely include SSA in a modern course on compilers.

[1]: https://github.com/dotnet/roslyn/tree/master/src/Compilers/C...

zaptheimpaler · on July 25, 2016

Hi, this seems like a good place for my newbie compiler question.

I'm trying to write a compiler for LISP - i wrote a simple interpreter already (mostly just to learn how compilers work). Definitely did not need anything complicated for tokenizing,parsing thanks to https://github.com/lihaoyi/fastparse.

What resources are useful now for learning about implementing a type system and the optimizer? You said register coloring etc. arent important - is this because we can target LLVM?

jtolmar · on July 25, 2016

> These days you don't have to understand LR parsing or touch a parser-generator

What would a modern parser use instead?

Or is it just that for most compiler work the parser has already been written?

dozzie · on July 25, 2016

LL parsing library, or better yet, library for PEG grammars. You know, to add another dependency to already bloated program and not to care about having O(n) parsing time/memory.

fwefeefew · on July 25, 2016

What's replaced register coloring?

dllthomas · on July 29, 2016

... letting someone else do it?

groovy2shoes · on July 29, 2016

Language Implementation Patterns by Terrance Parr might be right up your alley. Implementing Programming Languages by Aarne Ranta is likewise a refreshing take on language implementation instruction. Bonus points for both books being rather concise and affordable.

I've also really enjoyed Elements of Programming Languages by Friedman & Wand (older editions cover different topics than the current 3rd edition, and it may be worth reading both the 1st and 3rd editions). Focuses primarily on interpreters, but most of the topics covered have applications for compilers as well.

Going a slightly more traditional route, I've found Andrew Appel's compiler books, Modern Compiler Implementation in ML and Compiling with Continuations to present pretty wide coverage of techniques at all phases of a compiler. The former is the one I normally recommend for people looking for their first (and hopefully only) compiler book (if they're actually looking for a compiler book rather than a DSL book).

Of course, techniques for processing the Lisp family are well-covered in Christian Quiennec's classic Lisp in Small Pieces (he's written some other books which are quite interesting for Lisp historians as well). Implementors of Lisps will also probably want to look into The Art of the Metaobject Protocol by Kiczales et al. SICP's coverage in chapters 4 and 5 is also pretty good as an introduction, even if it only barely scratches the surface.

Getting into types, you're pretty much limited to Pierce's Types and Programming Languages, but Bob Harper et al. have also written Homotopy Type Theory which is freely available on the Web. I haven't finished HoTT yet, so I can't comment on how accessible it is or how broad/deep its coverage is.

Those interested in rewriting systems should start with Term Rewriting and All That by Baader and Nipkow, which I've found to be extremely accessible and fairly comprehensive (covering abstract rewriting systems, string rewriting, graph rewriting, etc.). Your choices for books on rewriting written in English is pretty small (I've literally found only three, one of which costs $300 USD(!) -- where does Cambridge University Press get off??), but fortunately the Baader and Nipkow book is so good that you probably won't need another book and can proceed straight to the literature.

---

There are tons of books on compilation, but I honestly think they're mostly a waste of time. Each book, for the most part, only re-iterates the basics in another way, so unless you didn't quite grasp it from the first book you tried, maybe look at another one, but once you understand the material, don't waste any more time or money on compiler books. After you've grokked the basics (i.e., worked through one or two introductory books), you'll get far more value from reading books on other related concepts (e.g., type theory, term rewriting, metaobject protocols, etc.), reading research papers, journal articles, dissertations, etc., and, of course, reading code.

I say all this as someone who made the mistake of spending lots of time and money on lots of different compiler books, stupidly hoping to learn something new with each one. The silver lining is that this puts me in a pretty good position to recommend particular compiler books :)

That the Dragon Book is still considered the standard introduction is baffling, even the most recent edition is horrendously outdated and dwells far too much on parsing techniques (where, from the perspective of a compiler writer, parsing is effectively a solved problem -- not to mention that there are quite a few parsing techniques the Dragon Book doesn't cover, despite the amount of text devoted to parsing). And, like you said, the techniques it covers are really only immediately applicable to imperative procedural languages without sophisticated type systems, such as C or Pascal.

PaulHoule · on July 25, 2016

They are still teaching undergrads how to make compilers that give incomprehensible error messages, etc. Compiler technology has the possibility of making a dent in the essential difficulty of getting computers to solve problem and the compiler class is not oriented towards those needs.

There's nothing exciting about coding up a toy version of "Pascal" but it would be exciting if you could add the "unless" construction from Perl to Java in <1000 lines of code.

We are just now coming out of a dark ages in compilers that was brought on by gcc.

Before gcc you could make a living writing compilers (i.e. Turbo C, Turbo Pascal, Turbo Prolog, ...) at a moderately sized company. Today you have gcc and you have compilers from the likes of Intel and Microsoft.

LLVM breathed some life into gcc since it is now modularized so that you can do interesting things on top of gcc.

adrianN · on July 25, 2016

The target language shouldn't make much of a difference, should it?

mcguire · on July 25, 2016

There are 3 languages you have to understand to write a compiler:

* The language you are using to write the compiler. You have to know it well enough to write a complex application.

* The language you are compiling. You have to completely understand it.

* The language you are compiling to: x86, JVM, etc. You have to understand it well enough to write every complex application.

nils-m-holm · on July 25, 2016

I agree, except for the last point.

Unless you are writing an optimizing compiler, you just have to understand the target language well enough to write snippets that resemble the operations of your source or intermediate language, which can be pretty little. See:

http://t3x.org/subc/cg386.c.html

Anecdotally: I have once written an AXP21164 back-end without any prior knowledge, only based on the reference manual. It was not pretty or fast, but it worked.

sitkack · on July 25, 2016

Nils has an extraordinary amount of compiler related projects and books, http://www.t3x.org/

PaulHoule · on July 25, 2016

It makes a huge difference.

If you are compiling for x86 you have to think about register allocation.

If you are compiling for JVM or an HLL, you don't. A good chunk of what is discussed in that article just is not relevant anymore.

chrisseaton · on July 25, 2016

Isolated anecdote, but I work on a JVM based language and my colleague is having to write a register allocator because our input language is written in SSA form and so has a huge number of locals in methods. We need to map these onto a smaller number of JVM locals (like register) otherwise the JVM frames are huge.

Also, if you aren't writing a register allocator because you're using the JVM or LLVM, then someone else needs to on your behalf. We can't forget these skills.

drfuchs · on July 25, 2016

Doesn't a modern JIT collapse JVM locals into a smaller actual stack frame already? Compilers have been doing this for local variables with non-overlapping liveness for decades.

chrisseaton · on July 25, 2016

The JVM can't do this because of debugging. If you attach a JVM debugger and examine a local variable which has had the storage reused what would you see? Junk from some other local variable. It's one cost of always on debugging.

Normally the cost isn't too bad and of course they're spilled to the stack not kept in registers, but if the language you are implementing has thousands of locals you may even reach the limit of the frame size. We've seen this in practice in more than one language and application.

Also, only some frames are JIT compiled - what about the rest?

drfuchs · on July 25, 2016

In the JVM, the start_pc/length of each LocalVariableTable entry lets you have more than one name for a given index in the stack frame, exactly so as to be able to reuse indexes for variables that don't overlap in liveness. So the debugging and non-JIT objections don't pertain.

But if you're really running into the 64k limit on max_locals even with index reuse, then you're out of luck.

chrisseaton · on July 25, 2016

But that's my point - the person emitting the bytecode needs to be the one doing the work to re-use local variable indices. And the algorithm for doing that is register allocation. So even if you target the JVM you may still need to know how to do register allocation, and it isn't some historical piece of trivia.

For every local variable index, the JVM needs to keep the value live for the duration of the method because that's what the debugger will show.

And yes we have seen the 64k limit blown by real programs in the wild that aren't designed to be awkward (I'm in the VM research group at Oracle).

Drup · on July 25, 2016

Furthermore, the semantics of a language should be tailored for its target. Some execution pattern works well on the JVM, some don't. Same for javascript and other targets.

Inventing whatever semantics you have in mind without considering how it's going to be compiled is a recipe for slow languages.

johan_larson · on July 25, 2016

Compiler construction is a big field, so it's easy to get lost in the details.

If you are mostly interested in principles rather than the most recent tooling, there's a course by Wirth that makes it tractable.

More here: http://short-sharp.blogspot.ca/2014/08/building-compiler-bri...

qwertyuiop924 · on July 25, 2016

If you're interested in getting started with interpreters, which are easier, you might want to look into Daniel Holden's excellent Build Your Own Lisp (And Learn C). Although it has been criticized for many reasons, it's a great book, and if you find interpreters and compilers totally magic, it's a good place to start.

Also, after reading What every compiler writer should know about programmers, I finally understand why people hate C. Because this just shows definitively that C compiler writers have been in their own little world for the past few decades.

Man, now I want a C compiler that wasn't written by a bunch of mindless jerks that will be the first up against the wall when the revolution comes...

_RPM · on July 25, 2016

I wrote a VM, I still can't get recursion to work. It's hard.

chrisseaton · on July 25, 2016

I never understood why recursion causes anyone any problems, because recursion is the absence of a special case limitation.

If I tell you that a function may call any function, then you already know everything you need to know for recursion. If we didn't have recursion, only then would I need to qualify what I just told you with the restriction that a function can only be active once.

When I show students recursion I can't understand their confusion. I think to myself 'but I already showed you functions can call any other function, why do you see this case differently?'

(Obviously I try to be more patient, understanding and anticipatory in person.)

DigitalJack · on July 25, 2016

I'll share why I struggled with recursion (as best as I remember):

Most programming I learned was imperative. As I wrote the code, I imagined the execution in my head. This led to a problem where when I was halfway through a function, and it referred to itself, my brain would segfault. How could something I was not yet done writing refer to itself? I hadn't finished yet, so my brain could not comprehend what such a reference would mean.

It was also difficult because I wanted to think of functions as nice neat pieces of code that would take some data, do it's thing, and return a value. I could mentally inline a function call without much effort.

But when recursion is introduced, the floor drops out of my mental inlining. Suddenly my mental effort for such things becomes huge. For me anyway. My brain doesn't like to float around in abstraction land for long, it needs to periodically be anchored in the concrete. Otherwise I quickly lose my sense of direction and orientation... I lose my context.

Declarative languages actually make these easier, for me, because I'm not mentally executing a recipe as I write. I am giving a piece-wise description of what something is. So there is no mental tracing.

I expect the a-ha moment for recursion is different for everyone. But just showing recursion in many different forms would probably help. For example, show fib sequence generation where instead of the function calling itself, each function calls a uniquely named function... such that you concretely demonstrate building the first 5 or so numbers in the sequence...

Then show the similarity of the functions, show what the computer has to keep track of with the nested function calls, and step by step work your way to straight recursion.

Show it in BASIC with GOTO statements.

Finding as many different ways to concretely demonstrate an abstract concept will help reach more people.

akkartik · on July 25, 2016

As someone who enjoys teaching programming[1], your comment was my favorite of the day[2]. My learning experience was similar to yours, starting with imperative languages, so you got me to think about how I think about recursion today, given our shared baggage:

1. I do still inline recursive functions, but I've learned to selectively inline just the base case when I first implement recursion. Paradoxically, experience with lisp helped me with recursion in C, particularly Common Lisp's trace facility which taught me to visualize stacks of multiple function calls (whether recursive or not) rather than a single one at a time.

2. I've learned to think declaratively even when I program in C. When writing a C function I might start out with a crisp definition in my head ("this function saves the reverse of the list seen so far in its second argument") so that I can rely on that definition even when the implementation isn't yet complete.

[1] http://akkartik.name/post/mu

[2] https://news.ycombinator.com/favorites?id=akkartik&comments=...

qznc · on July 25, 2016

> Show it in BASIC with GOTO statements.

That should happen naturally, if you implement a compiler to native code. A "call" instruction is just a "goto/branch/jump" with some extra stack fiddling.

Also, I don't think pure functions are so easy mentally either. Try to understand "call with current continuation". ;)

1021203121 · on July 25, 2016

This is not about understanding recursion, it is about implementing recursion correctly.

Recursion is likely to expose bugs in the calling convention if you're writing your first compiler.

_RPM · on July 25, 2016

Thank you. I wasn't sure if he understood that I was implementing it.

chrisseaton · on July 25, 2016

I did, and I think my comment still makes sense in that context.

If I give you a list of requirements for a programming language that I want you to implement and it includes 'a function must be able to call any function' then you have recursion. You don't need additional separate requirement for 'and that function may be itself', as long as you've fully implemented the first requirement. In fact you'd need an extra requirement to prevent recursion, not to allow it.

That's what I mean by you shouldn't need to consider recursion as a special case.

winstonewert · on July 25, 2016

The fact is, its easier to implement a language without recursion. See early programming languages that didn't. In such a case, you can simply treat the local variables as if they are global variables in the implementation. You've got to do something more complicated if you are going to handle local variables in a recursive case.

The fact it is not an additional requirement to allow recursion is irrealevent, what matter is that its extra complexity to handle that case correctly.

_RPM · on July 25, 2016

That's a great point. I think there is fundamental flaw in my virtual machine because recursion does not work, but calling other functions from within a function do work.

j4_james · on July 25, 2016

It may interest you to know that some older programming languages didn't originally support recursion, although they did support function calls (early FORTRAN being one example). The return address for a function call was typically stored in a fixed location associated with the target. So if you called that function once, you wouldn't be able to call it a second time until the first call had returned, otherwise you'd end up overwriting the first return address.

Not that I think that is the OP's problem - just a bit of interesting history. Support for recursion may seem obvious in hindsight, but there was a time where that wasn't the case.

_RPM · on July 25, 2016

Yes, but I generate the arguments from converting the AST into the internal byte code, so arguments are just a LOAD_NAME, etc. There seems to be a problem when the arguments are coming from the currently executing function's stack (instead of the AST) As I restore the context from current executing function execute data to the previous execute data, and pop the stack once, and push it into the old context's stack as the return value.

Also, just to be clear, I understand how recursion works as a user of a programming language, though implementing it is different than using it.

ww520 · on July 25, 2016

Recursion call should be exactly the same as an ordinate function call. The following is an example of asm code gen for function definition and recursive function call. Code gen to VM code should be similar. Hope it help.

Assume the following AST.

  FN_DEF: { NAME: foo, PARAMS: [int4 param1, int4 param2] }
    LOCAL_VARS: [int4 var1, int4 var2, int4 var3]
    ...
    ASSIGN: { var1, CONST: 0x1 }
    ASSIGN: { var2, ADD:{ 0x2, param1 } }
    ...
    var3 = FN_CALL: { foo, ARGS: [var1, var2] }
    ...
    RETURN: { var3 }

The generated code would be (all numbers are 10-based instead of hex for simplicity):

  fn_foo:
    push ebp          ; save the caller's old frame pointer
    mov ebp, esp      ; the new frame pointer to current stack ptr
    sub esp, 20       ; make new space in stack for params and locals
                      ; EBP points to the base of the current frame
                      ; the frame has 20 bytes in the stack
                      ; var1 at [ebp-4]
                      ; var2 at [ebp-8]
                      ; var3 at [ebp-12]
                      ; param1 at [ebp-16]
                      ; param2 at [ebp-20]
    ...
    mov [ebp-4], 1    ; assign 0x1 to var1
    mov [ebp-8], 2    ; assign 0x2 to var2
    add [ebp-8], [ebp-16] ; add param1 to var2
    ...
    push [ebp-4]      ; push var1 for the function call
    push [ebp-8]      ; push var2 for the function call
    call fn_foo       ; call function foo at address fn_foo
                      ; the current EIP is saved in stack
                      ; the return value will be in EAX
    mov [ebp-12], eax ; save function return value to var3
    ...

    mov eax, [ebp-12] ; set function return value from var3
    mov esp, ebp      ; pop frame
    pop ebp           ; restore old EBP to previous frame
    ret               ; return to the caller by popping the
                      ; caller's address from stack into EIP.
                      ; Execution will continue at the restored EIP address.

ericbb · on July 25, 2016

I definitely struggled to find a satisfying approach to recursion in my own current language project. What I'm doing right now is passing the closure value of the callee as a hidden first argument to every call. Nonrecursive functions just ignore that argument but recursive functions can always call themselves via that hidden argument.

It doesn't allow you to support syntactic mutual recursion but it's easy to do and can be implemented without data modification at any level.

chrisseaton · on July 25, 2016

Do you not have any kind of registry of functions in your language where you can look up a function from a name and then call it? Then you don't need to pass in the current function to itself as it can look itself up in the registry, like any other function could (again, no special cases needed).

There shouldn't be any need for recursive calls to be a special kind of call. They caller shouldn't need to know that it is calling itself (again, lack of special casing).

This is how recursion works in languages like Java, Python, Ruby, C.

ericbb · on July 25, 2016

That's an excellent question! :)

In my language, there is no special global scope for variables; every program is basically one giant (extended and sugared) lambda calculus expression.

I do use lambda-lifting so that in the C code I generate, there is a C function in the global C namespace that is called to execute an object-language function but object-language functions also have closure environments so any kind of self-reference needs to include both the global C function and the closure environment for that particular closure.

Note that my language does support modularity (breaking programs into multiple files, basically). However, the mechanism for referring to "packages" (stuff in other files) uses a separate name system.

I agree that languages that bind all functions in a global scope can easily use that global scope to resolve recursive references. It's also easier when your language supports variable assignment and destructive updates of data structures. But my language doesn't support those things either. :)

Addendum: By the way, you mentioned Java, Python, and Ruby, which are all object-oriented. Of course, the recursion among methods in those languages arises in a way that's very similar to passing a callee as a hidden argument. In OO languages, the hidden argument is "self" or "this"!

ww520 · on July 26, 2016

What about using scope environments and variable binding? For lambda calculus, variable binding (of the single variable x) in an environment (the current application of lambda) is a fundamental part of it anyway.

A scope environment gives you a place to name "things". "Things" can be functions. It's just a name-value map. You create a local scope environment for every function invocation. You can bind things (value or functions) to names in the local environment. Then you can refer to "things" by their names, like calling a function using its name.

Parent scope environments can be nested in the local scope when the function is called. A search on a name not found on the current scope can be delegated to the parent scope, so that functions defined and named in outer scope can be referenced in inner scope.

Recursive call is not a problem for named function since you can look up the function by name. It becomes a problem for anonymous function (lambda) since it can't be looked by name. For that you can introduce a special name like "_lamb" for it. Upon entry of a function, you bind the current function to the name in the current environment. A reference to it would call the current function again. e.g. _lamb(). You can even have a special name "_parent" to bind to the parent environment. In that case you can call the anonymous parent function. E.g. _parent._lamb()

Function calling and recursive function have nothing to do with OO or Java/Python/Ruby. It has everything to do with name, binding, scope, and environment.

chrisseaton · on July 25, 2016

> Of course, the recursion among methods in those languages arises in a way that's very similar to passing a callee as a hidden argument. In OO languages, the hidden argument is "self" or "this"!

Self or this allows you to refer to the same object. You don't need to refer to the same object to wind up calling the same method. You could have two different objects that just have the same method. Maybe they share a class, or a mixin, or whatever. So I'm not sure self is relevant here.

ericbb · on July 25, 2016

Yes, I think I see your point. If I may put it in my own words, I think you are talking about "dynamic recursive activation" throughout this thread.

I don't mean to disagree with the point you are making! It's a valid point.

However, I would like to emphasize that some languages have syntactic properties that complicate things a bit and I feel that a comment about how things "should" be really ought to be qualified by something like, "in a mainstream Java-like language".

Consider the simply-typed lambda calculus, or Martin-Löf type theory. These languages don't allow general recursion but it's not because they are always poorly implemented!

_RPM · on July 25, 2016

My language has first class functions, as names are tied to values, and a value can be a function. With the opcode CALL_FUNCTION (in my VM), it does a searching of the hash table for the value.

infraruby · on July 25, 2016

For a program without recursive calls the students can get away with believing that local variables are statically allocated, but for a program with recursive calls the students must understand that local variables are allocated on the stack; that what appears (lexically) to be one variable in the source may be many variables at runtime!

octo_t · on July 25, 2016

I think the issue is it forces people to think about preconditions, postconditions and invariants in a very different way if they've come from imperative programming.

qwertyuiop924 · on July 25, 2016

If you don't evaluate the function's body when it's declared, and your language is lexical, than your function should always be evaluated in a context where its own name is bound. If your language has dynamic scope, than the above still basically works until somebody defines a local var to the name of your function.

The only situation where recursion wouldn't work is where all of your vars are static.

As for compiling Lexical scoping, I can't help you there, although I've heard it's not too hard, if you don't care about optimization. Dynamic scoping is dead simple to compile, though.

As for tail call optimization, if your target is an assembler with a goto instruction that in any way resembles a real instruction set, the only slightly tricky bit is figuring out what function is in the tail position. Once you've done that, it shouldn't be too hard. Lambda, The Ultimate Declaritive has details, IIRC.

jfoutz · on July 25, 2016

Dybvig's dissertation is great. [1] People might disagree that it's a compiler, it targets a fairly high level vm rather than a native machine. But it's got everything you need. Really, you can fire up dr racket, type it in and have a great framework in an afternoon.

Anyway, it's very readable.

[1] http://agl.cs.unm.edu/~williams/cs491/three-imp.pdf

barrkel · on July 25, 2016

If you're more interested in the front end than the back end, then Crenshaw's Let's Build a Compiler is still worthwhile.

cocoflunchy · on July 25, 2016

I recommend http://createyourproglang.com/ too if you want something very simple and you don't know where to start.

Silhouette · on July 25, 2016

Unfortunately that page seems to make a lot of big claims (and unnecessarily insult a lot of established work in the field) but seems to include literally no useful information about the book at all: no table of contents, no indication of who the target audience is or what prior experience is assumed, not even a summary of the topics it covers.

poseid · on July 25, 2016

nice collection - i keep some notes myself here, and was able to generate my own parser with Jison https://github.com/mulderp/mulderp.github.com/issues/13

Once the parser returns the AST, it is getting more complicated, how to decorate an AST, add actions, etc. still looking to learn more about compiler backends

Ind007 · on July 25, 2016

Is there any similar kind of collection for static analysis?

ericbb · on July 25, 2016

Matt Might's site has some pages on this topic:

http://matt.might.net/articles/books-papers-materials-for-gr...

http://matt.might.net/articles/intro-static-analysis/

http://matt.might.net/articles/partial-orders/