Hacker News new | past | comments | ask | show | jobs | submit login

Regex to parse lisp expressions?



Regexes are at least useful for parsing numbers and symbols.

But yeah, that shouldn't be where you get stuck.


[\s,](~@|[\[\]{}()'`~^@]|"(?:\\.|[^\\"])"?|;.|[^\s\[\]{}('"`,;)])

Step 0, so I didn't get very far.

https://github.com/kanaka/mal/blob/master/process/guide.md#s...


It's a long regex, but it's just whitespace followed by an alternation with 5 different types of data: split-unquote, special characters, strings, comments, symbols. The string tokenizing branch is a bit complicated because it has to allow internal escaping of quotes. Early iterations of the guide didn't explain the regex in detail but the section now describes each of the regex components.

There are online tools to help visualize regex's. Here is a recent tweet including a visualization of mal's tokenizer regex: https://twitter.com/Mehulwastaken/status/1382292764834996230


Whoa that regex is a monster. Try starting with simpler pieces and see if you get further this time around. Good luck! https://gist.github.com/cellularmitosis/75dc4aefe88438c14e94...


Well, you certainly don't need that regex to implement a Lisp.


The regex is used as a tokenizer, the outputs of which are then fed into the reader module.


Yeah little weird since regexes can’t parse context free languages. I suppose most so-called regexes aren’t actually regular expressions, but it still feels like driving screws with a hammer.


Mal uses a regex for lexing/tokenizing. I didn't want people to get hung up on the lexing step (my university compilers class spent 1/3rd of the semester just on lexing). It's certainly a worthwhile area to study but not the focus of mal/make-a-lisp.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: