I think you should almost always build a compiler when presented with a parse th...

hinkley · on Jan 13, 2022

Parser yes, compiler no.

It occurred to me at several points in my career that the tools I was using were deeply hamstrung by mixing interpretation of complex data with the use of that data, making it difficult for anyone to effectively add new behavior in the system. Most vividly, I ran into this with first Trac (which supports plugins, but at the time each one had to reimplement scanning of wiki words and all handled a different 80% of all scenarios) and Angular 1.0.

A variant on this, I don't know that I've ever heard a coined phrase for but I just call Plan and Execute. In the spirit of dynamic programming, you visit a graph of data once in order to enumerate all of the tasks you will need to do, deduplicate the tasks, and then fire them off. This works much better than caching because under load a single unit of work can potentially self-evict. It's also a lot easier to test. It does however delay the start of work, so either you need to make that up in reduced runtime (by not doing things twice) or have a mixed workload. You may also have to throttle outbound requests that had previously been staggered by alternating CPU and IO bound phases.

I have one bit of production code where this is working well, and another that desperately needs it but is challenging to reach feature parity due to the way it is invoked and limitations of a library that it uses. I'm going to have to change one or both.

ianbicking · on Jan 13, 2022

When you describe it that way it reminds me of SAX [1] – I always hated SAX, but eventually realized it was kind of a tokenizer that left it up to the developer to figure out how to turn that into a compiler, though in this case compiling XML input into some internal data structure or action.

[1] https://en.wikipedia.org/wiki/Simple_API_for_XML

sunir · on Jan 13, 2022

Weirdly, but sometimes ideas churn around HN, I just replied to another thread about this. See my reply on this thread https://news.ycombinator.com/item?id=29917060

At BitFlash, one of the things we had to build was a SAX parser for the SVG DOM. I used the DSL of the W3C spec to compile the SAX parser.

One of the more strange things I did was in some contexts (think old school Blackberry) we had a server to pre-parse the SVG so we "knew" it was clean (I'm still sceptical 20 years later this is ever a good strategy, but take it as a given). Because we knew the SVG was clean, there was a faster way to parse the XML than reading the tokens.

I used my magic Perl script transformer to compute the lowest entropy decision tree to identify a token with the fewest comparisons, which was surprisingly way more efficient than a trie.

int_19h · on Jan 13, 2022

The real annoyance with SAX is its push model. Similar pull APIs (e.g. XmlReader in .NET: https://docs.microsoft.com/en-us/dotnet/api/system.xml.xmlre...) let you parse XML in much the same way as a recursive-descent parser parses text, with the reader working much like a lexer.

tylerscott · on Jan 13, 2022

I completely agree with the sentiment but also feel like it is just my bias because I enjoy writing compilers.