It's 7TB compressed. If it's text you'd need about 70TB to decompress it. It's p...

emj · on July 3, 2022

I've tried to do lossy compression of epubs with some lines of bash scripts; i.e. removing the images and fonts that were not needed. Many epubs could be downsized to a third of their size, but then I found a book that needed the supplied fonts and gave up. When doing lossy compressions can not have those kind of bugs.

What I also found was that many of the images in the epubs were already unuseable and nothing like their counter parts in phsyical books.

solarmist · on July 3, 2022

I don’t understand this. Are they epubs of comics or something? Epubs are already compressed (zip).

kristofferR · on July 4, 2022

Good compression of lots of epub files can likely be way more efficient, as deduplication/compression algorithms can be run on lots of books at the same time. Especially so with a good dictionary.

robonerd · on July 3, 2022

It's not terribly uncommon to find an epub with several megabytes of cover art and a few hundred kilobytes of text.

int_19h · on July 4, 2022

Things can be rendered from compressed container files. For HTML with images, even slow-but-strong compression like LZMA is already fast enough to render pages as fast as you can click through them, even on fairly old hardware.

Kiwix .ZIM file format is a good example. The entire Gutenberg Library is a single ~65 Gb file, and you can read any book from it without unpacking anything.