Classp: a “classier” way to parse

munro · on March 25, 2015

> Parser generators are a great help in writing parsers, but they all have one major drawback--they are based on grammars. So what's wrong with grammars? Well ... nothing as an expository tool. When you want to diagram a complex rule for deriving tree-like structures within a string of symbols, nothing beats a good old-fashioned grammar. But when you want to write a computer program to convert a string into the logical structure that it represents and then use that logical structure for further processing, then grammars are not ideal.

Not sure what they mean by "logical structure". But I've used Parsec to convert a string into AST. Then did a second pass with Parsec to turn that AST into VM representations of what the program is talking about. Which worked really well for me. :)

tel · on March 25, 2015

And if that's not a "logical" enough of a structure then you can skip the AST step if you really like and parse directly into the final effect.

david-gudeman · on April 2, 2015

By "logical structure" I mean the AST. I make this clear in the rest of the document. The point is, with Classp you don't have to write a grammar to create a parse tree and then convert to an AST; you write a specification that produces the AST directly.

sklogic · on March 25, 2015

Cannot see how it's shorter and more readable than a formal grammar + AST production rules. Given that the class syntax is very bloated (unlike, say, ML ADT constructors), there is no chance a language grammar will be compact and comprehensible at a single glance.

Disclaimer: of course I'm biased, I'd oppose anything with the word "class" in it.

JoeAltmaier · on April 2, 2015

The cool part is, its not only a specification but an implementation.

gaze · on March 24, 2015

Is there any formalization of how to invert a formatter, or is this just something they're figuring out in code as they go along?

jlarocco · on March 25, 2015

I looked at the code and the manual a little bit, and it almost looks like they're just writing an object oriented style wrapper syntax around Bison and Flex. At a quick glance, I'm not sure they're really inverting a formatter as much as they're adding a "syntax(...)" attribute that tells the tool how to format the parse tree elements generated from that grammar rule.

david-gudeman · on April 2, 2015

Well, the language of class patterns can be read naturally in either direction--parser or formatter--much like a grammar can be read as a recognizer or a generator. And in fact there are features in the class patterns that really only apply when parsing and are ignored when formatting. You could, with some justice, describe it as a reversible notation rather than as a formatting language that can be inverted. However, constructing a formatter from class patterns is a fairly trivial one-to-one conversion to C++ while constructing a parser requires some complex analysis and restructuring.

agumonkey · on March 25, 2015

Ha, spinning it as inversion... not very classy.

infogulch · on March 25, 2015

Maybe that's why "classier" is in quotes in the title.

akkartik · on March 25, 2015

This blog post by the author points at Daniel Yellin for the theoretical underpinnings: http://unobtainabol.blogspot.com/2014/10/constraint-attribut...

sklogic · on March 25, 2015

I wonder how they handle cases like "a*(b+c)".

ecaradec · on March 25, 2015

I agree. I'm very weak in grammar, but I dont understand how this one would work. Shouldn't it be ?

  Expr ::= Expr + Term | Expr - Term | Term;
  Term ::= Term * Factor | Term / Factor | Factor
  Factor ::=  Int | ( Expr )

david-gudeman · on April 2, 2015

There are precedence indicators to describe precedence (BTW, this is needed for formatting as well as parsing). The details are too complicated for a reply, but they are in the language manual.

sklogic · on March 25, 2015

Exactly. And yet it's not easy to infer mechanically that an Expr (but not its Term branch) must be printed in "( ... )" when it's in the Factor branch of an AST.

btown · on March 25, 2015

I'll leave other commenters with more parser-writing experience to critique this... but the general concept of "Write code in the simplest-to-reason-about direction, and the system magically inverts it into a much-harder-to-reason-about algorithm" is one of my favorite patterns in computer science.

For instance, there's probabilistic programming [1], where you write a procedural program that says "if we were to fix a bunch of properties/states of the natural world and provide them as input, here's the output/reading you'd hypothetically get," and then from that, a system creates a function that, given a large number of outputs/readings, provides a posterior probability distribution over those hidden properties/states of the world. It's still far from the level of performance you'd get from hand-rolled machine learning algorithms that take advantage of all sorts of insights you have into your data... but then again, in general programming, there was a time when compiler optimizations were so naive that you had to hand-roll assembly code for almost anything. I see probabilistic programming in a similar level of infancy, and I'm excited to see where it goes, especially with the backing of a large DARPA initiative [2] to do some really exciting fundamental research.

But there's so much more to the "invertible" paradigm. SQL itself is exactly along these lines - your WHERE clause is just a filter function that works on a single-record level, but it's interpreted as highly optimized instructions to iterate along data structures, to collect all the data that satisfies that filter function. Facebook's Relay [3] is the beginning, I think, of a large push towards declaratively specifying data dependencies for web applications, and having a system that does all the heavy lifting of figuring out when to fetch data, when to push it back to the server, how to handle requests on the server, and how to handle mutations with optimistic client-side updates. All you need to think about is the easy question of "what data might I need, and if it were to exist at any given moment, how would I display it to the end user?"

Humans think generatively, whether it's about data or formatting or GUIs or the physical world. We have all these ideas in our mind, and we can visualize how we would expect them to appear. When our computer systems can let us focus on those high-value thoughts, that's what the future looks like.

So let's cut the out-of-the-box parser thinkers some slack!

[1] http://www.pl-enthusiast.net/2014/09/08/probabilistic-progra...

[2] http://www.darpa.mil/our_work/i2o/programs/probabilistic_pro...

[3] https://gist.github.com/wincent/598fa75e22bdfa44cf47

david-gudeman · on April 2, 2015

An interesting take. I've always been something of a programming-language junkie motivated by the feeling that there must be clearer, better ways to express things. Classp was motivated in large part by my dissatisfaction with having to write a recognizer and warp it into a parser and then turn the parse tree into an AST. Why can't I just express directly how the AST is represented in the target language?

atesti · on March 31, 2015

I loved http://llvm.org/docs/tutorial/index.html, where a parser is just build by hand. Looks surprisingly easy!