Deterministic/non-deterministic finite automaton. It's the standard term for a finite state machine which processes a stream of characters. All the examples in his article are DFAs.
Thanks! I'm interested in the idea of a reversible state machine, for parsing an application specific language syntax into some machine representation, then allowing the user to modify it and unparse back to syntax again. So far I've had to implement procedural code for both directions but I figure there must be some way for some smart code to both parse and unparse using some declarative parser rules^. I've had no luck finding any existing art so far.
^I figure this won't be super simple, since e.g. whitespace is often ignored at the tokenizer step and would need reinjected by the unparser, but that sort of stuff is probably not too difficult to handle.
Notwithstanding caveats about what "reversible" means here, I'm interested in the application. I've been interested in "non-destructive compilation" for a while, including what you might call "language skins"—derivations built from reversible, deterministic transformations on the syntax graph created from an arbitrary parse. With the current popularity of (one-way) transpilers, I'm surprised there's not more interest and work on making them two-way, for example, but that's just one possible application.
I'm part of a team developing a scientific modelling tool. The user defines a model via syntax similar to SPICE's netlists, specifying model objects and their parameters and connections. The nature of the type of simulation requires that the user runs the script over and over while tweaking various parameters, some by hand and some via optimization routines. We use the syntax as a way to serialize the complete state of the model, to be embedded in the generated results (the script size is negligible compared to the results). It is therefore a requirement that we can generate syntax from the model that, once re-parsed, results in the same model again - thus the need for forward and reverse parsing.
My current approach is to store the parsed AST as part of the model, then start from this (updating anything that was changed/added/removed in the meantime) when generating syntax.
> I'm surprised there's not more interest and work on making them two-way, for example, but that's just one possible application.
I suppose it's a combination of not many users typically wanting such a feature, the ease of storing a program's state as a proprietary/binary blob (Python's pickle module for example), and the tight coupling such a system creates between syntax and objects: almost everything created by the parser should map 1:1 to syntax, even if modified by the user after parsing. One example of that last point: you might define some sort of axis via syntax, like `range(100, 200, increment=1)`. The parser then creates this axis as an array of 10 data points. But the user may also modify this in memory, e.g. adding `50` to the array, so it's no longer a monotonic range. when you come to unparse you need to have some code to identify what this array represents (is it still a range? or a sequence with no order?), and generate corresponding syntax. That's tricky and requires lots of code for this usually niche feature.
"Reversible" meant consuming the character stream in reverse order while matching the same language, so it's different from what you have in mind--automatically generating a printer from a parser. On the surface it sounds very doable if each pattern of AST node is generated by a unique grammar production. Then pretty printing is the task of matching against an unambiguous tree grammar. It's not really different from what the compiler for a language like ML or Haskell does for pattern matching.
> On the surface it sounds very doable if each pattern of AST node is generated by a unique grammar production.
This seems to be the key, from my work implementing this in my application. Luckily for me the grammar itself can be somewhat tweaked to help ensure a 1:1 mapping between productions and objects, at least until we release v1, but if someone is having to work with an existing grammar I can see the need for all sorts of hacks.
Yeah, the good news is that your tool can statically detect the ambiguities and you can let users specify priorities for the conflicting productions to define a unique mapping. That's a standard technique in parser generators. There the ambiguities are in the opposite direction, but the same idea applies.
I'm interested in doing that as well, I've often thought of being able to go from Pascal -> ast + symbol table -> executable and back.
You would want to preserve all the whitespace, to make it completely reversible. The ability to modify the symbol table to directly rename a variable or procedure, would be quite powerful, and never wrong. 8)