- *"If my observations are correct, each segment has its own arithmetic decoder ...

sparkie · on April 23, 2024

Yikes! Doesn't seem like there's anything that can be done to solve that then.

I guess the only way to tackle it would be to target the popular software or libraries for producing PDFs to begin with and try to upstream parallel encoding into them.

Or is it possible to "convert" existing PDFs from single-segment to multi-segment PDFs, to make for faster reading on existing software?

perihelions · on April 23, 2024

Conversion's a very good solution for files you're storing locally! I'm working on polishing a script workflow to implement this—I haven't figured out which format to store things into yet. I don't consider it a full solution to the problem—more of a bandaid/workaround.

The downside is that any PDF conversion is a long-running batch job, one that probably shouldn't be part of any UX sequence—it's way too slow.

Emacs' PDF reader does something like this: when it loads a pdf, its default behavior is to start a background script that converts every page into a PNG, which decodes much more quickly than typical PDF formats. (You can start reading the PDF right away, and the end of the conversion, it becomes more responsive). I think it's a questionable design choice: it's a high-CPU task during a UI interaction, and potentially a long-running one, for a large PDF. (This is why I was profiling libpng, incidentally).

https://www.gnu.org/software/emacs/manual/html_node/emacs/Do...