Hacker News new | past | comments | ask | show | jobs | submit login

Good article, but I have to nitpick here:

Any well-versed programming language person can tell you about both recursive-descent parsers and generated parsers. They’ll also tell that generated parsers are the “right” way to do that.

Most of the time that advice will save you from wasted effort, but sometimes I think it keeps people from going down paths that may actually be fruitful. Sometimes the thing that everyone knows is true isn’t. (For example, every language I know of with a lot of real-world users actually does use a hand-written parser.)

Recursive descent parsers are good when 1) you have a simple grammar and 2) you want fewer dependencies. Most lisps have hand-written recursive descent parsers because lisp grammar is easy.

Also, MRI ruby (aka CRuby1.8) uses a generated parser.




False dilemma as there are parsing frameworks that use top down grammars (namely PEGs and their derivatives).

Grammars written in these languages very closely resemble the structure of hand written recursive descent parsers, unlike generative grammars, but also are amenable to automatic generation of efficient parsers (namely packrat parsers, but in practice even just one or two entries of memorization at each production is performant).

MRI's parser is a disaster. It makes sense given the context of when it was written, but the language would be free of some syntactic warts if they'd moved to something with more lookahead some time ago. The opaqueness of the parser grammar is repeatedly mentioned as a limit on useful contribution to the language.


This is from hazy memory, but I stated that because I think GCC, Clang, MS's C# compiler, and javac all use hand-rolled parsers.


I think the recursive descent ones are also worth favoring when you want to have more informative error messages


Yes, I like writing recursive descent parsers but I don't know of any in "industrial" use, beside the lisp parser you just mentioned.

I'd like to know if there are more.


LLVM's clang uses a recursive-descent parser to parse C, ObjC, and C++ (http://clang.llvm.org/features.html#unifiedparser).

Clang is the "C Language Family Front-end", which means we intend to support the most popular members of the C family. We are convinced that the right parsing technology for this class of languages is a hand-built recursive-descent parser. Because it is plain C++ code, recursive descent makes it very easy for new developers to understand the code, it easily supports ad-hoc rules and other strange hacks required by C/C++, and makes it straight-forward to implement excellent diagnostics and error recovery.


Industries using a recursive descent parser: http://www.edg.com/index.php?location=customers_oc




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: