To me, the article somewhat misses the point of what's interesting here. Using ASTs to represent equations, or even whole programs has plenty of precedents in ML/AI. I'd have liked to know how exactly they translate these trees to a representation suitable for an ANN. Fortunately, the paper seems to be easy to find and access (it's [1], I guess).
It looks like they go from tree -> sequence via prefix notation. I'm curious why Lample decided on this seq2seq approach when it seems that there might be models which could be more naturally applied to the tree structure directly [1, 2]
For the same reason people use huge transformers for sentences in natural language (which are also tree structured): they scale really well. If you have enough data, huge transformers have huge capacity. If you notice, this paper is entirely about how to cleverly generate a massive dataset. There is no novelty in the model -- they just use a standard approach described in two paragraphs.
[1] https://arxiv.org/abs/1912.01412