I went through this a while ago (in Prolog). It was interesting, but with a lot of room for improvement. My criticisms as I remember them:
- Error messages about output being different from the expected use some sort of very explicit printing mode with lots of quotes and escaping. A message saying that one got "\\\'\\\'" where "\'\\\\'" was expected is needlessly hard to parse when the actual difference is \'\' vs. '\\' (or whatever).
- Many parts are marked optional, but the test script complains if you don't implement them, and later sections assume that they are implemented, so you have to treat them as compulsory anyway.
- Some things regarding environment updates were contradictory or at least not explained properly at first, with some very relevant information only coming later. I don't remember the specifics, but the problem was roughly that I set up my data structures thinking that everything could be implemented in a pure way, but at some point it became clear that you either need some other data structure or must use destructive modifications.
- There are several reasons for the weird way that the output and expected messages are printed. One of the main reasons is that in order to enable more flexible test case results, the expected output is a regex rather than a plain string, while the output is just a plain string. There is probably a way this could be made clearer. I'll add it to my TODO list to look at.
- Do you have some specific examples? The test cases are marked as either deferrable or optional. You shouldn't ever have to implement optional (and if so that's a bug in the tests or the guide). I think the deferrable items are marked pretty clearly in the guide where they become mandatory in later steps. If it's not clear, then that's a bug.
- This is one of the tensions that exists with trying to make the guide incremental; later features may require re-work of earlier functionality. I do try and minimize that as much as possible although I've found it can really vary depending on the nature of the target language. Note that the primary goal of mal/make-a-lisp is pedagogical (as opposed to say "the easiest way to make your own Lisp"). So sometimes the need to go back and re-work something is in line with that goal.
If you have any concrete guide text or test driver improvements (especially that further the pedagogic goals of mal), I'm always happy to review pull requests! :-)
Thanks for your answer. I probably meant deferrables rather than optionals that were de facto required. Sorry I don't have a concrete example handy, it's been a while.
I went through MAL last year and had a blast! It's an excellent project to help you get a feel for Lisps and tiny languages, and I think is what finally got me over my "eww parentheses" feeling. After finishing MAL, I modified it to implement John Shutt Kernel-style f-exprs to really up the power-to-weight ratio, and had a blast doing it. MAL makes a great basis for experimenting with additional language features, highly recommended!
I attempted this a few years back, didn't get far before I got hung up on the regular expression(s) required for parsing. Silly to get stopped there, but there was just enough friction between the specified expression, the language I was using (Erlang), and my understanding of the subtleties that I punted.
I think with Regex101 in my back pocket I'll give it another try.
I also could not get the regex to work, so I ended up writing a custom tokenizer. It ended up being like 30 lines of code, and unlike the regex I could understand how it works
static public List<string> Tokenizer(string source)
{
// Initialise the token list.
List<string> tokens = new List<string>();
// Define a regex pattern whose groups match the MAL syntax.
string pattern = @"[\s ,]*(~@|[\[\]{}()'`~@]|""(?:[\\].|[^\\""])*""|;.*|[^\s \[\]{}()'""`~@,;]*)";
// empty ~@ | specials | double quotes |; | non-specials
// Break the input string into its constituent tokens.
string[] result = Regex.Split(source, pattern);
This took a while to understand and get going but it really improved my understanding of regex.
It's a long regex, but it's just whitespace followed by an alternation with 5 different types of data: split-unquote, special characters, strings, comments, symbols. The string tokenizing branch is a bit complicated because it has to allow internal escaping of quotes. Early iterations of the guide didn't explain the regex in detail but the section now describes each of the regex components.
Yeah little weird since regexes can’t parse context free languages. I suppose most so-called regexes aren’t actually regular expressions, but it still feels like driving screws with a hammer.
Mal uses a regex for lexing/tokenizing. I didn't want people to get hung up on the lexing step (my university compilers class spent 1/3rd of the semester just on lexing). It's certainly a worthwhile area to study but not the focus of mal/make-a-lisp.
If you are using erlang you should be using pattern matching not regex. Erlang/Elixir are one of the easiest languages to build parsers in with their binary string pattern matching.
I agree, pattern matching is what I sorely miss every time I use anything other than Erlang. It's just enough of a hurdle I set it aside and didn't return.
Brown University PLT textbook used lisp/scheme and their first paragraph was something like "nobody cares about parsing, let's get down to business in sexps"
I like parsing, I like regexes but I agree it's often a waste of time :)
One interesting thing about this project is the extensive use of Make. Not only is the test suite Make based, there is also a Mal implementation in Make
I have a Dyalog APL implementation since 1 year or more on my fork, but still haven't released it because for reasons related to the input/output handling it cannot pass the self hosting tests :(
To make the other tests pass, i wrapped the interpreter in a shell scripts that merges stdout and street, and that did the job for all bit the last step (self hosting).
The self hosting interpreter (albeit very slow), appears to be working correctly when used interactively
I was a bit surprised nobody’s tried a “verified” implementation in something like Coq/Idris/Agda (unless i missed one - I wasn’t familiar with all the languages in the list).
I did this a while back. I documented my experience in several blog posts, the last of which is [0]. These describe the process, the gotchas and how I got round them. I was working on C# on a Windows box so the result isn't quite the same as the vanilla MAL.
I was massively pleased when I got MAL to self-host.
This is unintentional given the history of the name of the project explained in another comment on this page, but "mal" in French means "evil", tying into the famous aphorism.
I wouldn't say it was entirely unintentional :-). I was definitely aware that "mal" could mean "evil" when I named it. It was a bit more apropos when mal only had a single implementation in GNU Make macro language.
- Error messages about output being different from the expected use some sort of very explicit printing mode with lots of quotes and escaping. A message saying that one got "\\\'\\\'" where "\'\\\\'" was expected is needlessly hard to parse when the actual difference is \'\' vs. '\\' (or whatever).
- Many parts are marked optional, but the test script complains if you don't implement them, and later sections assume that they are implemented, so you have to treat them as compulsory anyway.
- Some things regarding environment updates were contradictory or at least not explained properly at first, with some very relevant information only coming later. I don't remember the specifics, but the problem was roughly that I set up my data structures thinking that everything could be implemented in a pure way, but at some point it became clear that you either need some other data structure or must use destructive modifications.