Are there any courses on creating a new language, interpreter, and compiler from...

noelwelsh · on March 13, 2023

Various books do this. Essentials of Compilation is a recent one: https://github.com/IUCompilerCourse/Essentials-of-Compilatio...

You can find a PDF and the associated course with a very minimal amount of following links.

If you are thinking of making your own language, it's also good to learn something about programming language theory, if you don't already. Many languages make mistakes that have been solved 25+ years ago. PLAI is good for that: https://www.plai.org/

samsquire · on March 13, 2023

I compiled a basic C program and then read the output of GCC -S to see the assembly.

I used what I learned from GCC output Then I wrote a basic amd64 expression compiler with simple live ranges and linear register allocation and ANF (a normal form) for postorder AST traversal for code generation. It can only evaluate nested add and mul expressions.

https://github.com/samsquire/compiler or on replit: https://replit.com/@Chronological/Compiler3#main.py

Finding opcodes is a struggle but there is this website: https://www.felixcloutier.com/x86/ and https://www.cs.uaf.edu/2016/fall/cs301/lecture/09_28_machine... (the table at the bottom is really useful)

To learn code generation not targeting amd64, I wrote an imaginary assembly language and then wrote a switch based virtual machine and a compiler for it with a frontend that looks similar to javascript.

https://github.com/samsquire/multiversion-concurrency-contro... in ProgramParser.java, LanguageInterpreter.java, and LanguageInterpreterRunner.java for where main() is.

JonChesterfield · on March 14, 2023

Intel's documentation covers encoding. I remember it being relatively heavy going to decipher, been a few years since I've worked on x64 though.

bgar · on March 13, 2023

Check out https://mitpress.mit.edu/9780262047760/essentials-of-compila...

Author has made complete draft PDFs for both the Racket and Python versions available here. https://wphomes.soic.indiana.edu/jsiek/

_the_inflator · on March 13, 2023

Thanks, highly appreciated.

I tend to be biased towards what Jeremy Siek himself markets as "Proven in the classroom" when it comes to book authors in CS. Many book authors lack this experience and simply write for themselves, which is ok, but can result in bad didactics. Good teachers and authors from academia are invaluable.

abecedarius · on March 13, 2023

That's a great point about experience teaching the topic. (I've written a few articles where I certainly wished for that experience.)

I just wish textbooks didn't have their own bad tendencies: they have pablum as an attractor, because on average students just want to get through the class, not doing too much worse than average among the other students. Even without this problem, there's a more basic one: like with enterprise software, the decision to buy the book is not typically up to the user. "Will people actually want to read this on their own time?" is a strong driver of quality, even though it has pitfalls too.

ly3xqhl8g9 · on March 13, 2023

Precisely made for "creating a new language, interpreter, and compiler from scratch": http://craftinginterpreters.com

satvikpendem · on March 13, 2023

This does not cover compilers. I know because I specifically asked munificent (the author) about this before and he mentioned some other books to look at.

ufo · on March 13, 2023

He's being modest :) Sure, the book doesn't cover many traditional compiler topics such as register allocation and code optimization. However, it does cover parsing and translating to intermediate representation, which is 90% of what you need to get your own programming language off the ground. And to be honest, many university compiler courses won't have time to go much farther than that anyway.

koonsolo · on March 13, 2023

What do you mean exactly by "compilers"? Because the book features a chapter on how to compile source code to virtual machine byte code.

sondr3 · on March 13, 2023

Most of the time when people talk about compiled languages/compilers, they mean languages that compile to native machine code, not languages that compile to a bytecode/run on a virtual machine.

barrkel · on March 13, 2023

Compiling to machine code rather than VM bytecode is a difference in degree rather than kind, and doing it well rather than naively is a deep topic.

Simple machine code generation with basic register allocation and evaluation which eagerly spills to the stack is not super hard. Producing fast code instead is a bottomless well.

vsnf · on March 13, 2023

Is that true? I think of C# and Java as compiled languages and they run on the CLR/JVM which are both virtual machines.

sondr3 · on March 13, 2023

It's just the nomenclature that I'm used to, compiled languages are so called because there is no VM, it's just machine code that is run while C# and Java still requires the JVM to do JIT compilation at runtime. But it's mostly splitting hairs, there is no agreed upon definition of what makes a languages compiled/interpreted.

orthoxerox · on March 13, 2023

Both have JITs. So someone wrote a compiler from CIL/JVM bytecode into native code.

t-3 · on March 13, 2023

There's literally no difference in the theory or technique of compiling for a virtual machine or a real one. Real machines are just messier and have a bunch of edge cases and weirdness that require you to read those multi-kilopage manuals to figure out. Having one or more intermediate steps is useful for portability and optimization though, so it's probably a good idea to target a virtual machine unless you only intend to ever target one specific architecture that you know inside and out.

e12e · on March 13, 2023

Plot twist - the developer then runs the code in qemu for rapid iteration and ease of debugging... ;)

golergka · on March 13, 2023

Yes, but it's still the most beginner-friendly book on implementing programming languages that I've ever seen. It covers VMs on the second part, and going from that to asm is not a huge leap.

weird_user · on March 13, 2023

Compiling to machine code is not that hard once you get to the IR or bytecode phase, check out this toy compiler:

https://github.com/byo-books/pretty_laughable_lang/

celeritascelery · on March 13, 2023

Is that really true? I would expect that going from IR to native code would be the hardest part because you have to worry about architecture and supporting different ISA’s. IR can be simple because it is abstract and hides lots of the dirty details of the hardware.

t-3 · on March 13, 2023

Generating code is not all that hard. Generating fast code is a challenge. Decoding instructions on some architectures is hard. If you have a nice general IR naively generating native code can be just a table lookup with a register tracking/allocation routine.

delta_p_delta_x · on March 13, 2023

UPenn's CIS341[1] is pretty nice. It uses a bottom-up approach, where assignments start with a 64-bit integer-only x64 simulator, then goes up to compiling LLVM IR to x64, then compiling the lecturer's language ('Oat') into LLVM IR, and then optimisations like dead code elimination, register allocation, variable substitution, etc.

[1]: https://www.seas.upenn.edu/~cis3410/current/

agentultra · on March 13, 2023

The https://www.nand2tetris.org/ is pretty neat. The compiler in part 2 isn’t very advanced (it doesn’t cover much in the space of optimization or the state of the art) but taken as a whole you end up designing the chip, ISA, VM, language and compiler, so it’s pretty good for that aspect.

tinco · on March 13, 2023

When I did my undergraduate degree, you could opt to do that for 1 extra point. Compiling for an ISA is not that much more difficult than compiling for a bytecode IR like that of Java or .Net which I imagine is the typical target of these courses.

mrkeen · on March 13, 2023

Have a look at https://keleshev.com/compiling-to-assembly-from-scratch/

cyber1 · on March 13, 2023

https://compilerbook.com/

fancyfredbot · on March 13, 2023

Check out project Oberon by Niklaus Wirth. It involves building a compiler from scratch for an ISA designed from scratch and building an OS for it. You can even put the processor onto an FPGA and run the code you compiled on physical hardware you designed.

AlphaSite · on March 13, 2023

The prior course involves implementing a code generating compiler with optimisation passes etc for a language.