Hacker News new | past | comments | ask | show | jobs | submit login

The biggest complexity (and security) problem with PDF is that it's also effectively an archive format, in which more or less every display file format conceived of before ~2007 can be embedded.



Yeah pretty much. There's JBIG2, JPEG2000, CCITT Fax and Flash to name a few. Oh and a bunch of TIFF stuff without the wrapper. Some good news though: the PDF-A standards define various archive-safe subsets of PDF for which various verification tools exist.


On the other hand, PDF is probably the only widespread use of formats like JBIG2 and JPEG2000 --- which are rarely encountered as individual files, unlike JPEG, PNG, or GIF.

A lot of the scanned PDF ebooks on archive.org use JPEG2000+JBIG2, and the filesize vs. quality difference compared to more traditional formats like JPEG is quite apparent. They do take a noticeably longer time to render, however...


> They do take a noticeably longer time to render, however...

That's mostly due to distinct lack of good JPEG2000 decoding libraries. We're building a PDF renderer library and JPEG2000 is a constant pain int he ass due to it - JPEG decompression is hardware accelerated on many platforms and also has a bunch of SIMD optimized libraries. For JPEG2000 there's practically nothing and due to complexity of the format we count decoding times in seconds for some images even on fast mobile phones.


I've been playing around a bit with JPEG2000 (slowly learning about the format, trying to write a decoder for it) --- whereas JPEG normally uses Huffman compression for the bitstream, which although not really parallelisable is relatively fast (essentially 1 table lookup per output value), AFAIK the bottleneck in JPEG2000 decoding is the arithmetic compression, which can't be parallelised either, and involves quite a few more operations than Huffman's inner loop.


XPS solved many of the problems with PDF but it was far too late by then, PDF was well established.


Maybe it's time for a PDF-2017 standard that drops support for those older exploitable formats


Yes, it's called PDF/A.


If they’re exploitable, how would a new version help? Attackers would just use the older, exploitable versions. And if PDF viewers only allowed the newer version, you’d break support with every PDF made.


>And if PDF viewers only allowed the newer version, you’d break support with every PDF made.

That is not at all something that would have to be true.


Its called deprecation / forwards compatibility




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: