Thats the sad state of (i guess) automatic pdf text extraction, consequence of r...

Thats the sad state of (i guess) automatic pdf text extraction, consequence of research papers being exclusively in pdf, consequence of (la)tex coming from another age. I love and praise tex for what it allows to do, but my opinion is that now is the time to get past it, learn everything it has done right and apply the new knowledge we have in language design to get a better surface language (lower friction syntax, higher-level semantics allowing to separate structured content from typography and extract other stuff than a visual document from a source file). Tex being so good means it has such a monopoly that this kind of project have to be tremendously good to have a chance (which is probably a good thing).