Show HN: How to write a tiny compiler

akkartik · on Feb 9, 2017

Semi-tangential rant: I went to grad school for compilers, and yet I've been struggling for several months to get started building something real-world. I can build interpreters and parsers in my sleep at this point, but the crucial next step nobody talks about is having a deep knowledge and understanding of the target language that your compiler will be emitting. For a conventional native ahead-of-time compiler that requires knowledge of x86, the ELF format, OS loading and more. I suppose this is why toolkits like LLVM are so popular :/

jfoutz · on Feb 9, 2017

I'd suggest, make your first big goal to emit assembly, and use an existing toolchain to assemble and link. It gives you the opportunity to run your code much quicker.

if you can handle compilers, assemblers are a piece of cake. So you can add a flag to emit a .o instead of a .S when you get down to that level.

I've never actually written a linker, but when you're down at that level it's fairly obvious what needs to happen. There's some sort of binary prolog that indicates this is an executable. Then you have this table of symbolic names to offsets, you replace the symbolic references with actual numbers.

If you have something like a.o, b.o, and c.o, and a.o needs to call foo in c.o, it's kind of just sizeof(a.o) + sizeof(b.o) + offset(foo, c.o)

assemblers and linkers are kinda tough because they need to be perfect, but you don't have the hard algorithmic problems you have in compilers, like register allocation.

Also, if you limit yourself to a small subset of x86, you can pretty quickly verify correctness using existing tools, then make toy versions of your assembler and linker. Adding more instructions only means touching the assembler.

You can do it!

sedachv · on Feb 10, 2017

+1. Direct compilation is simpler and faster than layering everything. Abdulaziz Ghuloum wrote a great tutorial to "show that building a compiler can be as easy as building an interpreter": http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf

A great compiler based on this principle is TCC: http://bellard.org/tcc/ The code generators are basically built around calling parser.next_token(). It is simple (highly recommend reading the source) and very fast. There is also a linker included.

userbinator · on Feb 10, 2017

building a compiler can be as easy as building an interpreter

I prefer this demonstration to the one in that paper:

Interpreter: https://news.ycombinator.com/item?id=8558822

JIT compiler: https://news.ycombinator.com/item?id=8746054

To make things more interesting, it's "real syntax", i.e. C-like, rather than the "almost no parsing required" syntax of a Lisp, even smaller than TCC, and yet it can still compile itself.

C4 and the few variants it's spawned have probably become my favourite family of tiny, understandable, but complete compilers. The sheer elegance and simplicity --- one file of less than 1K lines --- makes studying them highly recommended. Almost all the other tiny/teaching/toy compilers I've seen are larger, less capable, or both. I don't know if anyone has made a variant of C4 that generates an AST, but doing that would probably not be too difficult either, and then you'd have a tiny compiler you could experiment on with optimisation and such.

chubot · on Feb 9, 2017

Have you looked at the Go toolchain at all? I don't know much about it, but they implemented their back ends in a manner that is much smaller than LLVM and GCC. And it sounds like the architecture of the compiler is significantly different.

I watched this talk and it was pretty interesting:

https://www.youtube.com/watch?v=KINIAgRpkDA

https://talks.golang.org/2016/asm.slide#1

If I am understanding right, the claim is that they have a single assembly language for ALL GO architectures, based on something Ken Thompson wrote in the 90's for some National Instruments (?) chip. He actually says that all assembly language looks the same now.

I was sort of surprised by that claim. I would like to hear a critique of this from other compiler writers. Does the fact that Go uses a single assembly language make the code slower? I imagine taking advantage of target-specific knowledge is useful, but I don't know how much.

LLVM has this pretty elaborate TableGen system to express target-specific knowledge.

He also makes the claim that they will be able auto-generate a new backend from the PDF description of the architecture. And he says somewhere that they reduce a lot of the work to simple "text processing" (symbol manipulation) and didn't require opening up any processor manuals.

I can see how arithmetic and bitshifting is the same among all architectures. But I would think that even loads and stores have differences, at least if you want to use the processor efficiently. But he says everything kinda looks the same.

Also, another thought is that compiling to WebAssembly might be simpler than native executables?

userbinator · on Feb 10, 2017

I can see how arithmetic and bitshifting is the same among all architectures. But I would think that even loads and stores have differences, at least if you want to use the processor efficiently. But he says everything kinda looks the same.

The basics are the same (and indeed that is why many RISCs resemble MIPS), but you're correct that more advanced (or perhaps I could say, Intel-ligent ;-) architectures have many more interesting instructions that are beyond the subset, and can definitely save on code speed, size, or both when used effectively but are not easy to match nor describe the semantics of. It's easy to match X+Y to an ADD, or even X = 4 * Y + C to an LEA, but at the other extreme, how would you match a whole AES encryption round and replace it with a single AESENC instruction? Currently, no compiler I know of can do that, so you must use intrinsics or pure Asm.

chubot · on Feb 10, 2017

Yeah that's what I would have thought, but I would like to see it quantified. Does all the target-specific info give you 10% or 100% improvement? Maybe they're onto something? The simplification does sound drastic.

He does actually mention the AES instruction in the talk. I think they just have some one-offs to deal with those specific cases. I guess Go is different than C because it comes with a big standard library. They can just make sure that their standard library crypto uses the AESNES instruction, whereas C compilers have no idea what crypto library you're using.

akkartik · on Feb 9, 2017

Thanks for that brain dump! The details on Go were fascinating.

I haven't looked more than superficially at any existing compilers outside of small toys. Reading large codebases is hard, and I have a harder time with it than most. (Hence my current project involving compilers: https://lobste.rs/s/n0d3qo/what_are_you_working_on_this_week... ) But let me put Go back on the head of the queue..

GFK_of_xmaspast · on Feb 10, 2017

Looking at those slides and watching the first part of the talk, it sounds as if he's re-invented LLVM IR, and poorly.

chubot · on Feb 10, 2017

If it's a lot smaller and serves their purpose, it's not implemented poorly. And if it compiles 10x as fast, which it probably does (I'm not a heavy Go user.)

Also I think this representation is different architecturally. Go now uses an SSA representation apparently, which would be the equivalent of LLVM IR AFAIK. This assembly language is more toward the backend where as the IR is more like the "middle end".

77pt77 · on Feb 10, 2017

Why LLVM specifically?

there are many intermediate representations.

GFK_of_xmaspast · on Feb 10, 2017

First one I thought of and also you can write it directly.

Someone · on Feb 9, 2017

If you want to roll your own code throughout, writing your own lexer, parser, and backend, for your first attempt, pick a dead simple source language and do not may any attempt at generating efficient code.

An option would be a Basic from the '80s: no functions, all variables are global. If you think implementing a return stack is too hard, forget about supporting gosub; goto may be harmful, but it is enough to write some interesting programs (Certainly forget about computed gosub/goto in your first version)

That makes lots of stuff incredibly easy. Variable 'a' always has the same address (the simplest implementation even would just allocate room for variables a-z, a%-z% and a$-z$ regardless of whether they got used, so that 'a' has the same address in every program)

Forgetting about optimization means that, for example

  a = a + 1
  a = 2 * a

would compile to

  load a
  add 1
  store a
  load a
  mul 2
  store a

, independently of where it appears in the program. That makes your compiler consist of two steps:

  - compute a map mapping line number to generated code for that line
  - fix up the targets of goto's (in the first version, don't bother about trying to use short jumps; just assume all jumps will be long)

Inefficient? Yes, but make it work first, then make it fast (or skip that, as, frankly, if you want something for use in the real world, you should try and build on the likes of LLVM)

Don't think that's cheating; it gives you around a tenfold speed up over a basic interpreter, and back then, people sold compilers doing the above.

If you want advanced functions such as sin, do not hesitate to call into an external library. Those compilers from the '80s weren't afraid to call into ROM code, either.

jfoutz · on Feb 9, 2017

This is a great approach. I'd probably advise adding lexical scoping, because managing the stack is pretty interesting.

Going from source to binary on your own code is incredibly satisfying and incredibly enlightening. It thoroughly drives out any "magic" in your mental model of computation.

I agree wholeheartedly use LLVM (or something like it) for anything real. tens or hundreds of thousands of hours, by really smart people, gives you optimizations a lone wolf can never duplicate.

stevekemp · on Feb 11, 2017

That's a great suggestion. I went one step lower and wrote a (perl-based) virtual-machine. As you suggested I did the bare minimum at first, for example when I see labels I just remember their locations. Later on I patch up the offsets in the code, to make jumps to them point to the correct location.

The project seems reasonably popular, for reasons that I don't understand, but it is very simple:

https://github.com/skx/simple.vm

emeraldd · on Feb 9, 2017

One way around this is to divide the problem ... Compile to an intermediate format that is closer to what you want your final output to be. FORTH is a reasonable example and P-Code is another one (https://en.wikipedia.org/wiki/P-code_machine) then you can get the all the "easy" stuff into a stable state with an runtime library and back into a native code generator later. If you can get that working, you can probably jump to something like an assembler output with as or nasm to produce a "real" executable.

I got about half way there with https://github.com/arlaneenalra/insomniac and have been stuck looking for time to get back to it.

waynecolvin · on Feb 9, 2017

How does one create a reasonable intermediate languages? I'm at a loss there. :/

emeraldd · on Feb 10, 2017

Reasonable means the simplest and most minimal set of instructions to implement a particular behavior. These are your Primitives for the intermediate language. If you find something that you can't do in terms of the existing instructions, then the required operation should be added to your set of primitives. A good way to figure some of this out is to read through a minimal FORTH, say something like Jones FORTH (https://github.com/phf/forth/tree/master/x86 here's at least one port). The core set of primitives is in the .S assembly file and everything else is implemented in terms of those 'words'. If you want a really simple one, look up something like https://esolangs.org/wiki/Brainfuck and work your way up into a more reasonable (i.e. actual numbers!) ISA.

akkartik · on Feb 9, 2017

Thanks, particularly for the tip about P-code. Are you aware of any interpreters/assemblers for P-code? The Wikipedia page didn't seem to have any links.

emeraldd · on Feb 9, 2017

A quick google search pulls up:

https://gist.github.com/r-lyeh/0af42b2788bb75219061 http://scara.com/~schirmer/o/pcodevm/ Here's another ref: http://homepages.cwi.nl/~steven/pascal/book/pascalimplementa...

But really, all you need is an understanding of the ISA (i.e. the instruction set and the machine it runs on), and have a file format in mind for a middle layer and you can build a runtime. (Basically P-code one form of ISA that's supposed to be portable.) That's kind of what insomniac with a custom isa does:

ISA: https://github.com/arlaneenalra/insomniac/blob/master/insomn... ( a little FORTH happy ...)

Assembler Core: https://github.com/arlaneenalra/insomniac/tree/master/src/li...

VM eval loop: https://github.com/arlaneenalra/insomniac/blob/master/src/li...

and Instructions: https://github.com/arlaneenalra/insomniac/tree/master/src/li...

https://github.com/arlaneenalra/insomniac/blob/master/src/li...

It helps to build the assembler and vm backend first so you have a pipeline in play that you can use to add in ops as you discover they are needed.

(keep in mind all my code is toy stuff..)

carussell · on Feb 9, 2017

Well, since we are talking about P-code and all...

I recommend you look at Oberon. I've avoided the Wirth family of languages my whole life, but have been messing with Oberon off and on since winter, because it's interesting from a security standpoint. (It was supposed to be my fun Advent hacking project, but work changes and living changes—i.e., moving—caused interference.) Project Oberon is interesting because it involves a language, a system, and a machine, from the ground up, all created from scratch.[1]

Often, when trying to dive deep on some concept, the available literature can get you rolling with a toy (e.g., compilers), but it helps you reach only a facile understanding, and punts on everything around it. You'll be aware of this; I'm pretty sure it's what you're referring to in your comment above. That's mostly avoided with Oberon, because it's a full-fledged toolchain for quasi-real-world use—at least it was in production use at ETH Zurich.

There are some gotchas with Oberon, and it mostly comes down to a lot of vague, hypey comments written by people who haven't dived deep, and don't have the level of understanding that their comments suggest. There are numerous examples. I could write those up, but here's one: "It was all done without resorting to assembly anywhere." Then you go look into it, and that's because there's no assembler, and it's inline snippets of hex-encoded machine code and other binary blobs instead.

The second big gotcha is that Wirth & Co have produced volumes of (what looks like high-quality) literature, but a bunch of it is either out of date, only superficially helpful, poorly written, or contains errors. For example, "Oberon" refers to so many things—including systems and languages that Wirth had nothing to do with and probably should have never been allowed to bear the name—that it makes jwz's old Java rant[2] seem quaint. (Try starting out at the Wikipedia page and making sense of Oberon's evolution or mapping out the family tree, then try referring to primary and secondary sources directly that might clear things up. Good luck.)

I began with a fresh notebook for taking notes and keeping track of errata in the published stuff. I quit keeping track of errata after two days and several chapters, because it was too much. If you're interested, I highly recommend just running a system image from Peter De Wachter's Norebo[3] and using his emulator for Wirth's RISC machine[4]. Familiarize yourself with the basics of how to use Oberon-the-system by playing with it for an hour or so, crack open the source and just study it directly. Cross reference Wirth's publications if you want (they're all online), but assume that they're lying about something. I can also share my notes. Stay away from the mailing list, it's populated by USENET-style cranks, and it isn't really an essential component of Oberon development. There's not really a community—Wirth pretty much does his own thing, never posts there, and just does a source dump through his personal website.

Having said all that, studying Oberon won't impart all the knowledge you're looking for. It's in this weird place where it's more than a toy, but it really doesn't directly resemble any of the real-world systems that you're actually interested in. (Which most likely means UNIX; let's just be honest.) But it's probably the kind of stepping stone you need.

So the best resources on ELF I know of are the articles written by Eric Youngdale for Linux Journal[5][6], from back in the 90s when vendors were adopting ELF for the first time and he wrote the Linux implementation. I believe this to be the highest quality treatment of the subject that exists (at least as of a few years ago when I was interested in studying this kind of thing).

Hope it helps.

1. https://issuu.com/xcelljournal/docs/xcell_journal_issue_91/3...

2. https://www.jwz.org/doc/java.html

3. https://github.com/pdewacht/project-norebo

4. https://github.com/pdewacht/oberon-risc-emu

5. http://www.linuxjournal.com/article/1059

6. http://www.linuxjournal.com/article/1060

EDIT: I forgot the P-code tie-in! P-code is tangentially related to Oberon because it was developed to port/run (a dialect of) Pascal, one of Oberon's predecessors and Wirth's main claim to fame. Here's a sort-of P-code interpreter for Oberon—it actually runs a (fairly capable) subset of the RISC ISA that Oberon proper targets:

https://www.inf.ethz.ch/personal/wirth/CompilerConstruction/...

(Yes, that's the entire implementation. You'll need a compiler for it though. That can be found in the parent directory. To see what a more "fortified" implementation would look like, and implemented in C, look at Peter De Wachter's emulator, already mentioned above.)

theoh · on Feb 10, 2017

This may not be authoritative enough, but I found John Levine's "Linkers and Loaders" to be a very informative guide to the general concepts of linking, including the various architectures like ELF and COM.

http://linker.iecc.com/

Edit: not that they're conventionally called "architectures" or that COM has any structure to it.

akkartik · on Feb 10, 2017

Thanks a lot for that extended meditation! I love learning about the history of things, even when it isn't directly useful to my immediate problem.

to3m · on Feb 10, 2017

Don't link to jwz.org from HN :)

jpolitz · on Feb 10, 2017

Plug for my compilers course, which is incremental and uses some of the ideas from sibling comments re: emitting assembly and Ghuloum's great work. Last time I posted it on HN, someone got back to me in a few weeks that they'd successfully gone through the assignments, so it seems at least somewhat concretely useful:

https://www.cs.swarthmore.edu/~jpolitz/cs75/s16/

akkartik · on Feb 10, 2017

Thanks! I'm gonna check it out.

nickpsecurity · on Feb 9, 2017

Check out the QBE backend as it's designed for simplicity:

http://c9x.me/compile/

Note: You could both target and learn from QBE potentially.

ELF:

http://www.cirosantilli.com/elf-hello-world/

dochtman · on Feb 9, 2017

I built a compiler that just threw out LLVM IR to stdout. Pretty great way to get something working, if you ask me, and lots of potential to do FFI or other kinds of integration. LLVM takes care of lots of portability issues, and optimizes your stuff.

My experimental stuff is here: https://github.com/djc/runa. The compiler is written in Python, and shouldn't be that hard to understand. I'm also happy to answer any questions about it here.

emeraldd · on Feb 9, 2017

I still need to find the time figure out LLVM IR, do you have anything easier to read through than the official docs?

devwastaken · on Feb 10, 2017

Unrelated, but how would one get started creating a proper parser for textual content? For example, I have

  {
   "key"
   {
    "key" "value"
    "morekeys"
     {
     "key2" "value2"
     }
    }
   }

The markup would be seperated by newlines, with the key-value pairs on the same line, always in quotes. I've solved it with regex and a crude parser before, but I've always heard of things like Bison and such but never found a 'good' way to impliment a real parser in a language like Python.

akkartik · on Feb 10, 2017

To elaborate on userbinator's comment, you want to think of your format as a set of definitions:

A clause consists of a '{', one or more key-value pairs, and a '}'.

A key-value pair consists of a string and a value.

A value is either a string or a clause.

Now turn each definition into a function.

Here's a nice-looking article from Googling for "writing a recursive descent parser": http://weblog.jamisbuck.org/2015/7/30/writing-a-simple-recur...

userbinator · on Feb 10, 2017

I would go with a simple recursive descent parser, unless you are interested in learning a DSL, the intricacies of the parsing algorithm your particular parser generator uses, and willing to debug one --- IMHO not worth it. Even "real", production-quality compilers like GCC, Clang, MSVC, and ICC use recursive-descent; and some, like GCC, switched away from parser generators.

david-given · on Feb 10, 2017

Parser generators work really well for context-free grammars, but as soon as you start needing context-sensitive lexing (like C does) you enter World Of Pain time. And they avoid having to deal with lookahead, which is always messy in hand-written parsers.

But I think I'd agree that writing a recursive descent parser from scratch is probably the best way to learn about parsing; at least for a simple grammar.

BTW, fun fact: it turns out that BNF was invented by an Indian linguist called Pāṇini about 2500 years ago, in order to parse Sanskrit.

https://en.wikipedia.org/wiki/P%C4%81%E1%B9%87ini

webkike · on Feb 10, 2017

Well, it's the reason separation of concerns exist. You don't have to know all of the requirement to begin writing something real world, depending on what you mean by "real world". If you're trying to rewrite a feature complete GCC from scratch you're going to have to start somewhere and you won't be able to write everything from the start. So you divide the work in pieces and focus on each individually. Start out dumping assembly. If you really want to custom write ELF files that's fine but no place to start.

ddevault · on Feb 9, 2017

I often work on tangental projects to understand problem spaces more. Try writing a mini libc, for example, with a dynamic loader. Try writing a linker and an assembler.

akovaski · on Feb 9, 2017

Why don't you compile to a language that you know already or that is easy to understand, and then compile that? (Ex. C, Java, C#, C++, go)

akkartik · on Feb 9, 2017

Yes, that's a perfectly valid option if the goal is just to implement a new language. My goal (similar to OP) is to learn how a native compiler works.

MaulingMonkey · on Feb 9, 2017

Some of this starts delving into linking as well. Aside from documentation, you might consider reverse engineering tiny binaries. For example:

http://timelessname.com/elfbin/

Cheat a lot. They point out some disassembly for you. ".ELF" is clearly some kind of magic string to confirm it's an elf file. "A4 00 00 00" looks like a 4-byte little endian integer (164) or perhaps an offset (0x000000A4 happens to be exactly where all the strings start), "34 00 20 00" might be a pair of shorts... 0x34 = 52 = "Start of program headers" among other matches - it probably corresponds to one of these fields! Try mutating the file and re-running readelf to see what fields change, or if the program still runs.

Just a hex editor and a way to correlate the raw bytes of the disassembly to where the occur in the file can get you surprisingly far. While there will be "dead" data that you can't determine the nature of, documentation can help fill in the holes once you start getting a hold on the "important" bits that actually do stuff (mostly make your program un-runable, I assume.)

wfunction · on Feb 10, 2017

Are you talking about compilers, or just ("dumb") interpreters? To me, the hardest part of writing a compiler is the middle part that you totally neglected to mention: the static analysis & optimization. Not the parsing or the codegen.

zura · on Feb 10, 2017

One would say Compilers is a multi-disciplinary field. You might enjoy some topics more than others. E.g. I personally favor stages after the parsing and before the actual code gen. :)

readittwice · on Feb 10, 2017

True, but if you are willing to use some libraries or tools you can get pretty far. And if not, use them too :) you could always replace them later with your own tools.

IMHO you need to decide on what part of the compiler you want to focus on. You mentioned you already know how to write a lexer and parser. So maybe you should start with an IR like WebAssembly you could compile. Maybe you could even reuse your lexer/parser from your interpreter.

Compiling can be as simple as writing an interpreter, you "just" have to replace some code in the interpreter with `printf`s, that dump e.g. LLVM IR or Assembly. Then use LLVM or any assembler to get a binary. Even GCC basically works that way, gcc just dumps an assembly file and then passes it to assembler and linker. So GCC doesn't really emit ELF or Mach-O either, so don't let that stop you. That doesn't mean you don't have a "full" compiler.

If you don't know assembly use tools like gcc.godbolt.org to see what code gcc generates. The most important thing: Think what code to generate for each feature of your language or IR. If you have figured that out all the rest will be trivial. For example: What code to generate for `if`? So don't try to compile a language with many fancy features at first and please don't try to generate fast code. You want a translation that works in every case, although it might be somewhat inefficient compared to `gcc -O3`. Your first program to compile could be something like `int main() {}`. If you can compile and execute this program, you can start to add features.

This are the restrictions I would make at the beginning: * use a statically typed language, it is easier to generate (faster) code for that * start with global variables, add local variables later * support if, while and then functions (in this order) * only support few datatypes: int, int-array/pointer and string literals for printing.

This should be enough to run the benchmarks binarytrees and fannkuch from The Computer Language Benchmark Game. You will already beat many dynamically typed and interpreted languages for those.

Don't try to generate fast code at first, spill everything onto the stack so you don't have to care about register allocation. Even the V8 full-codegen jit works this way, so you don't have to be ashamed of it ;)

If you don't care about ELF, linking and stuff you could also avoid this by writing a JIT-Compiler: Just emit code into some buffer and then directly start to execute. Although I have to admit that a JIT can introduce some challenges of its own.

Ah and don't forget to write tests. Write a lot of tests from the beginning. Testing a compiler is pretty simple and is is also easy pretty easy to introduce subtle bugs. You will save a lot of time.

eliben · on Feb 9, 2017

Kudos for writing a compiler-construction walk-through without running out of steam after the parsing part :-) It's a shame this is where most tutorials end, since codegen is often much more interesting

qznc · on Feb 10, 2017

Here is a plan with more self-sufficient steps: http://belkadan.com/blog/2016/05/So-You-Want-To-Be-A-Compile...

argv_empty · on Feb 10, 2017

This still feels unsatisfying, since the target language is really just the source language with slightly modified surface syntax. Connecting languages with nontrivial differences is where writing compilers starts to turn interesting.

viebel · on Feb 9, 2017

So you made the whole journey until the last station without taking any breaks. Right?

eliben · on Feb 9, 2017

Nope, I was just curious so I looked at the blog archive and saw you actually published them all

viebel · on Feb 9, 2017

Ah I see! So what station are you for the moment?

Msurrow · on Feb 9, 2017

This is great, thank you! I've been looking for something like this: a simple small project that goes through all the involved phases and explains/demonstrates the concepts, that I can mess around with on the weekends. It shouldnt be have real life applications, be serious or academically correct -- it should just be fun.

I once found a blog post with an example of defining a grammar for a small language and then building an interpreter (in python i think) for it. It was really great, but I lost the link and can't seem to find it again.

The same goes for OS'es - a simple but "end to end" walkthrough would be awesome.

Anyway, cheers!

jonjacky · on Feb 10, 2017

> I once found a blog post ... but I lost the link ...

Maybe this one? Python: Writing a Compiler and Interpreter in 160 lines of code

http://www.jroller.com/languages/entry/python_writing_a_comp...

Msurrow · on Feb 11, 2017

Not the one, it was in multiple parts. BUT thanks - I'll read that one aswell :-)

camus2 · on Feb 9, 2017

Mandatory link to lisperator :

http://lisperator.net/pltut/

monokrome · on Feb 9, 2017

Hopefully it isn't intentional to suggest that you need to be a genius "guy" to do this. To be fair, this is about where I closed the article so maybe I missed the point.

OJFord · on Feb 10, 2017

The actual sentence is:

> You probably have to be one of those genius guys ...

Used in the plural, 'guys' usually just means 'people'/'folks' - "alright guys, shall we get going?" - Mac Dictionary agrees.

Happy people don't go around looking to be insulted.

TazeTSchnitzel · on Feb 9, 2017

It's entirely orthogonal, but the other half of that, the implication you have to be a “genius” to write a compiler, also annoys me. Is it really rocket science?

(I suppose it depends on what “real” means, though.)

vog · on Feb 9, 2017

I like the idea!

However, I feel the source and target language are too similar. Won't this discourage people? The result code looks like it could have been done in a simpler way, without all the compiler voodoo, by just moving around some parentheses.

Also, the implementation seems to make quite heavy use of external JS libraries. Maybe it is just me, but I'd like to add one NoScript exception for your site, and be done with it, rather than adding exceptions for so many different CDNs.

viebel · on Feb 9, 2017

I see. My blog is powered by the klipse plugin (that I have developed): https://github.com/viebel/klipse

The klipse plugin indeed loads a couple of external scripts - mostly from my github pages domain: viebel.github.io

popey456963 · on Feb 9, 2017

Is my browser messing something up or does the small div at the end seem to be in a really weird font that seems to be way to heavy (on font-weight)? When I read the page I get something resembling:

https://puu.sh/tVQAL/ec7bd4a05c.png

Apart from that however, I'm thoroughly enjoying this tutorial! It's remarkably similar to a tokenizer & parser I built when I was experimenting with a language that had no syntax, but was simply an AST that could be easily represented as Haskell/Python in the IDE of your choice. Really love how simplistic your code is though, mine just ended up being a mess of callbacks!

viebel · on Feb 9, 2017

Oh. This font is so ugly. What browser are you on?

no_protocol · on Feb 9, 2017

> a string of code and break it down into an array

break -> breaks

> There a three kinds

a -> are

> starts with a " and end with a "

end -> ends

> couple of tokenizer for a

tokenizer -> tokenizers

> a generic function that tokenize a single

tokenize -> tokenizes

> by means or regular expressions

or -> of

mrspeaker · on Feb 9, 2017

Did you just tokenize that blog post?

viebel · on Feb 9, 2017

what do you mean?

viebel · on Feb 9, 2017

Thx a lot. Fixed!

AstroJetson · on Feb 10, 2017

Look at Small C from the 70's. You could change the back end to target a different computer.