One of the problems with doing this in pdf is that for most documents, you need to infer reading order. A decade ago I wrote the code in poppler to do that (yet again, based off papers by Breuel) in order to get multi column select working. At the time I wanted pdfs to be readable on my iRex Iliad... anyhoo, most of the pieces are there in poppler. It can figure out reading order, render piece by piece, and already differentiates between images and text under the hood. Still a lot of work.
One of the problems with doing this in pdf is that for most documents, you need to infer reading order. A decade ago I wrote the code in poppler to do that (yet again, based off papers by Breuel) in order to get multi column select working. At the time I wanted pdfs to be readable on my iRex Iliad... anyhoo, most of the pieces are there in poppler. It can figure out reading order, render piece by piece, and already differentiates between images and text under the hood. Still a lot of work.