Hacker News new | past | comments | ask | show | jobs | submit login
Types of Parser Combinators (hootr.club)
90 points by todsacerdoti on Nov 16, 2021 | hide | past | favorite | 12 comments



Fun post. I hadn't connected Elm decoders and parser combinators before, despite having used them both.

What I love about parser combinators is that they are just code, in your language. They are also highly composable.

What I hate about parser combinators is how easy it is to accidental generate infinite loops or miss a crucial backtracking point.

Are there any parser combinators libraries that avoid these traps, particularly around recursion?


Generalized LL parsers[0] (effectively, memoized parser combinators) can effectively handle backtracking for you, supporting both left and right recursive grammars without issue. I'll leave some links that were particularly useful when I was learning about them[1][2]

[0] https://github.com/hexops/zorex/tree/main/src/combn#combn-ru...

[1] https://epsil.github.io/gll/

[2] https://github.com/robey/packrattle


Sprache for .net detects "left recursion" that would cause infinite descent. It fails early in that case.


I don't really understand what implementation of ideas the author is talking about at the end. Is he proposing anything different from existing parser combinators?

From what I read the author seems to abstract the idea of parsing to "transforming data with the possibility of failure". Not sure if I missed the point here


    A parser for things
    is a function from strings
    to lists of pairs of
    strings and things.


The idea of parsing indeed is to transform data, from a lower level of abstraction to a higher level of abstraction.

Parsing text (a sequence of characters) or binary files (a sequence of bytes) are well known.

The author mentions the case where a JSON parser has generated a bunch of JSON objects, but you want your own object types. (Deserialization libraries can do this automatically if all that is required is setting fields or properties, but for more complex cases you need custom code.)

Similar for XML parsing: all you get is a stream of element events, or a tree of generic XML objects, and you have to transform that further.

Another case would be when you get import data in the form of an Excel or CSV file. Your code starts with a 2D array of generic cells/values, and has to transform it into your own table/object types.


Author here, kind of. Most parser combinators only deal with strings and streams of bytes, but they can also be used for transforming other kinds of data. I think they're pretty useful for working with stuff like JSON or XML and I'd like to see more parser combinator libraries with support for these kinds of data structures. What I tried to do in the post is give names to the different kinds of basic blocks that are needed for parsing all these different kinds of data.

I'm more or less using Alexis King's definition of "parsing" here, as described in this post: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...


A side question: I'm probably out of touch or something, but what does the author mean by "colored websites"?


Well, for example, some people like to spend time on websites that are orange.


I'm teaching a compilers next semester, taking it over from a professor who has been teaching it for many years. Students will be writing a compiler for Cool programming language. I'm trying to rewrite the parser homework from yacc to parser combinator. Not sure if it's a good idea. Maybe students need to do an assignment in yacc. At the same time I have used parser combinator in my career as a software engineer.


For implementing things like programming languages, I think parser combinators also allow you write better and more specific error messages than something like a parser generator. Though I think most programming language authors optimizing for error messages tend to go with a bespoke approach.


I very nearly threw up my hands when writing an Elm parser combinator. The idea is great but the documentation and lack of advanced examples turned a day of effort into weeks. I just wanted to go back to Regexes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: