From my experience, implementing a programming language can be easier to grasp when doing it backwards, that is: back end -> front end. I can recommend The BNF Converter (http://bnfc.digitalgrammars.com/), which is a tool that generates the front end (C, C++, C#, Haskell, Java, or OCaml) from BNF grammar. Writing a back end gives you pretty clear big picture of how different parts work and relate to each other. When you have that you can write the front end with this vision.
Obviously, this is only my experience, but I thought of sharing, since in almost every compiler course I've seen everything starts with writing the front end. I still have a very, very limited experience and knowledge, but after I learn more about implementing PL/writing compilers, I see it everywhere.
I recommend http://www.hokstad.com/compiler , which covers the process of writing a compiler in the way that makes sense to me, the way you'd write any other program.
The link is broken: http://i.imgur.com/0qc3wf8.png
I got this error:
Heroku | No such app
There is no app configured at that hostname.
Perhaps the app owner has renamed it, or you mistyped the URL.
I must have missed the first part, but this is a topic that always interests me, especially when starting from fundamentals.
Writing a lexer in C, while simple, is non-trivial mainly due to the tasks of pointer arithmetic and string manipulation. If the author continues on to the task of writing a completely customizable LR parser or something of the sort in C or another high-level language, it might be useful to take a look at the source code for an LR(1) and SLR parser-generator here: https://github.com/gregtour/duck-lang/tree/master/parser-gen...
I may fork this branch from the main duck programming language trunk because it could be useful to other programming languages.
My focus in studying programming languages has been concentrated on frontends for languages. One drawback of my parser's implementation is that it can be slow for generating complete canonical parsers for any deterministic context free grammar and the tables can be quite large. However, there must be ways to improve the code base to provide better features and performance. Also, once you have a parse table, it really is a fast parser.
The benefit to using a ground-up approach like this is not only having a complete understanding of all of the technology involved but also in having complete control.
Although I haven't used GNU tools like LEX or YACC in practice, I dislike the idea of generating code in a macro form or really breaking the paradigms of C and C++ to create auto-generated code. For me, it is much easier to have programs that take an input, like a BNF grammar, and provide an output, the parse table. That can then be applied to create a syntax tree from source code. For me, this makes more sense in creating code that operates on data and data structures rather than having code generated around templates or macros.
Having control over the data structures in use is helpful because then a programmer knows exactly where all of the data is going and where it is being stored, somewhat useful in designing a programming language and something you lose in using someone else's libraries.
IMO, using macros to generate the parser has some advantages. For example, you have the freedom do generate the syntax tree however you like, adding extra "line number" fields or translating some syntactic sugar. If your language has proper support for macros its also really pleasant overall. I wrote a bottom up parser in Racket once and it was really nice: compiling the parser is a piece of cake and you can define your own macros to automate list-parsing and some other boring things.
And previous discussion: https://news.ycombinator.com/item?id=9555972