Hacker News new | past | comments | ask | show | jobs | submit login

The PDF specification is wild. My current favourite trivia is that it supports all of Photoshop's layer blend modes for rendering overlapping elements.[1] My second-favourite is that it supports appended content that modifies earlier content, so one should always look for forensic evidence in all distinct versions represented in a given file.[2]

It's also a fun example of the futility of DRM. The spec includes password-based encryption, and allows for different "owner" and "user" passwords. There's a bitfield with options for things like "prevent printing", "prevent copying text", and so forth,[3] but because reading the document necessarily involves decrypting it, one can use the "user" password to open an encrypted PDF in a non-compliant tool,[4] then save the unencrypted version to get an editable equivalent.

[1] "More than just transparency" section of https://blog.adobe.com/en/publish/2022/01/31/20-years-of-tra...

[2] https://blog.didierstevens.com/2008/05/07/solving-a-little-p...

[3] Page 61 of https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandard...

[4] For example, a script that uses the pypdf library.




In the context of a format that was originally proprietary and not widely available to everyone, and conceived in an era where encryption was strongly controlled by export law, that sort of security-by-obscurity was very common. Incidentally, a popular cracking tutorial back then was to de-DRM the official reader by patching the function that checks those permissions.



Or:

    qpdf --decrypt <source pdf> <destination pdf>


Aren’t the blend modes supported just the Porter-Duff compositing modes? You might think that’s overkill, but it’s a really good mapping of what other rendering pipelines offer and it can really help reduce the work to produce a PDF.


The original Porter-Duff compositing operators don’t cover Photoshop-style blending. Here’s a link with pictures:

http://ssp.impulsetrain.com/porterduff.html

The Porter-Duff operators are appealingly rigorous and easy to implement because they’re simply the possible combinations of a simple formula. But many of these operators are not very useful either.

The Photoshop blending modes are practically the opposite: they are not derived from anything mathematically appealing, it’s really just a collection of algorithms that Photoshop’s designers originally found useful. They reflect the limitations of their early 1990s desktop computer implementations (for example, no attempt is made to account for gamma correction when combining the layers, which makes many of these operations behave very differently from actual light that they mean to emulate).


An yes. I had forgotten that the blend modes added an arbitrary function that goes beyond the original model.


The permission field can also lead you down the rabbit hole of discovering noncompliance to its specification in some PDF writers and workarounds for these that may or may not be present in different PDF readers/libraries.


To be fair, if you wanted to stop copying of text it would be easiest just to drop the ToUnicode mapping against the fonts and then it’s a manual process for people to recreate them.


That also breaks search (and more importantly screen reader accessibility), and if you're professionally required to specifically produce PDFs with these security features enabled, you're pretty likely to be working in a context where that would be illegal.


It is impossible to stop text copying without breaking screen reading, because the screen reader could just log everything it reads.


You could do it (and Adobe has with some documents AFAIK) by using some kind of DRM solution, limiting access to approved software. That software wouldn't then be allowed to expose its UI tree to accessibility APIs, except for approved screen readers that embed a particular key in their executables. Those approved screen readers would have restrictions around what they can do with the text. Sure, everything can be broken, for example with third-party fake speech synthesizers or speech recognition applied to the screen reader's output (as contemporary OSes don't even provide good DRM mechanisms for audio), but it would make the process that much harder.


I used to be rich with selling a part of the stuff. FrameMaker. Used to be $5K US / copy Which came originally from Frame Technologies. [ Hi Steve Kirsch . I see you're rich still ]. PDF specification is wild. So right you are. At the time, many - including yours truly - said it was rude capitalism. So, you got it. People did not talk enough about DRM. Ps: I left Adobe embrace courtesy of my then wife, and me myself. I hate DRM as a user and as a -former- Salesman. Hola




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: