At first, i was disappointed that the Levenshtein Automaton is not some sort of golem. But then i read the rest of the post and realised that the Lucene committers have at least created a pretty impressive Frankenstein's monster.
The Python code; does it execute at runtime as part of Lucene, or is it only used in the build phase, to generate Java code which executes as part of Lucene? It sounds like the Python code implements the DFA generation, which would have to happen at runtime, surely?
I wonder whether they reached out to the authors of that paper, or the author of the Python implementation? You'd think people would be pretty ready to help a big open source project like Lucene. Particularly the academics --- it's unlikely they'd find a higher-impact open source use of their algorithm!
Code generation does serve its purpose, but I would argue that it's only a bit more effort to make it readable and with code-generated comments as well.
I once had to build an Oracle PL/SQL interface for a rather complicated integration project between two old systems and the business requirements for what had to go where changed daily.
In the end I opted to code generate the PL/SQL interface and supporting code from a spreadsheet that the business always kept up-to-date with the latest changes to the requirements. Ultimately they could change things to their hearts' content and I would only have to press a button to re-generate it all.
The Python code; does it execute at runtime as part of Lucene, or is it only used in the build phase, to generate Java code which executes as part of Lucene? It sounds like the Python code implements the DFA generation, which would have to happen at runtime, surely?