Hacker News new | past | comments | ask | show | jobs | submit login
Digitally sign PDF files from your commandline – open-pdf-sign (github.com/open-pdf-sign)
252 points by todsacerdoti on Dec 19, 2022 | hide | past | favorite | 77 comments



If only banks knew this!

They actively coax you into receiving your bank account statements as PDF, but I haven't found ANY bank which signs the PDFs (while bragging about security all the time).

I wonder what happens if they lose your money due to bugs or even intentionally - will they then happily accuse you of forging the PDFs because they're unsigned?

With paper that'd be not so trivial, in my country the paper often has some special format and the paper itself is of a special type, and it ages and you cannot easily guess the printer which was used.

Hence I still demand all my statements on paper. Same for utility companies, health care, and other institutions which want to convert their regular physical bills to PDFs.

I also demand paper because the concept of forcing the customer to manually go to N websites every month to download PDFs is idiotic. Nobody pays me for that wasted time of my life.

A paper mailbox instead is a central place where I can retrieve all of my documents easily in O(1). I wonder how many decades it will take the IT industry to realize that?


I'd be satisfied if my bank's web developers learned about the Content-Disposition header and set the filename reasonably. Nothing quite like downloading dozens of statements for various accounts at the end of the year and then having to rename dozens of files named "download.php (1-30).pdf" in your downloads folder. With a single line of code, each of those could be "institution-acct-year-month-day.pdf" instead. It would significantly reduce the toil that punishes diligent customers.


I'd rather go to 5 web sites and download PDFs than open 5 envelopes, throw them away, and sort out different paper documents into different binders or folders: nobody is paying me for the wasted time of my life to do that either :)

How do you organise your paper documents?


Paperless ng

Make your scanner put files in a place that Paperless can read them, then Paperless OCRs the file, makes it searchable, somehow finds the date of the documents, auto tags if you have it setup, and basically is a dream.

I don't organize them anymore, if I need an old document I search for some text in it or by date.

https://github.com/jonaswinkler/paperless-ng

There is a newer Paperless ngx that I have to upgrade to at some point.


Paperless-ngx looks pretty cool! But do you have any recommendations for a scanner that has the required feature?


I use an old Epson Workforce WF-7100. This printer does two sided scanning on the ADF, and you can create presets for color, black and white, etc. Mine was given to me out of a garage and has some wear and tear - the ADF jams a lot, so sometimes I have to use the glass.

As long as the device can scan to PDF into a network folder, I think most scanners/printers will work. Paperless works by monitoring a folder you choose - it doesn't care how files get to that folder.

It's very common for most all-in-one printer/scanners to be able to save to a Windows/SMB network share. In my case on the Linux box running Paperless, I also installed and setup Samba and exposed a share for the scanner.

An engineering firm I used to work for rented Kodak i2600 document scanners from the company providing their printers - and they were constantly scanning and these devices didn't mess up. If I did high volume scanning I'd try to get one of those.


This can easily be streamlined to consume little time by:

- Realizing that whenever you need to extract some old document in the future for reference / proof, you'll likely have a date range when it happened to go looking for it. And needing something old happens rarely enough that the overhead of searching for it can be neglected, so you'll layout your binders to make putting things away fast, not searching things. And the older things become the less likely it is that you'll ever need them again. So sorting by date is important.

- Thus realizing that any finished documents can go to a SINGLE binder which is sorted by date, you don't need a separate one for healthcare, utilities, whatever. You don't even need registers in the binder, just flat date sorting.

- Therefore, you'll only be having 3 binders:

"ToDo", "Done" and "Constantly needed" (the latter is for contracts for example).

Sort the contents of "ToDo" and "Done" by date. Adding new paper will be quick because new stuff arrives close to the most recent date so you don't have to search a lot for the place to insert it at.

AND: Make sure to mark the date on every document with a highlighter of always the same color so you can easily spot the dates when inserting.

TL;DR: Most documents will go to a single or two sorted-by-date places, just like your email inbox. This makes adding things fast.


Thanks for sharing your approach: I do like the fact you are optimizing for "putting away things".

The amount of documentation you have (and have to go back to) is probably on an entirely different scale to what I am personally dealing with: this would never work for me, since just a date would make finding stuff almost impossible.


Thanks for the appreciation :)

I put a lot of thought into this system so it's nice to know someone at least took notice of it!

Your problem of larger scale may or may not be alleviated by the details I left out for simplicity and because I assumed a typical reader may have a low scale:

- In practice my "ToDo" and "Constantly needed" folders do have registers such as "health care" etc. as those are the ones where I usually have to search for things. The date sorting there is inside the registers. For "ToDo" I also color-code the registers by priority and sort them by priority.

- The "Done" folders have a register for each year. For very high scale you might add registers for month. The folders are also labeled on the outside with their years. To make it easy to access things which you might have to go back to, you could add "look here!" registers which do not affect the date-sorting, i.e. like bookmarks.


Maybe it's not just the scale, but how much I can remember of when something happened: monthly registers would require me to search a number of them for a bunch of reasons as I might not be sure exactly when something happened, making it impractical.


  > How do you organise your paper documents?
I just throw them in a box, without even opening them. On the off chance that I ever need one, I go to the box and fish out all the envelopes from that bank, and look for ones from the likely date range.

That is, I optimize for quick storage at the expense of slow retrieval. But even the slow retrieval isn't bad.


I hope you never die so your heirs don't have to deal with this ;)


It's my way of staving off assassination attempts among my heirs.


I bought a duplex scanner and haven't looked back. Everything gets shredded unless it is required (contracts, official documents).


This is also perfect for keeping a digital copy of everything in the pursuit of the dream of being a digital nomad we all secretly harbor.


What are you going to do with a bank statement from a few months ago anyway? You will never look at it.


It becomes relevant during a divorce, or a capital gain or loss that has expenses related to the basis. Sometimes it's to your advantage not to have the documentation, but those situations are often zero-sum, so the other party will go the extra mile to find whatever is missing.


In those rare circumstances where someone wanted to see an old statement I've never had difficulty logging in and finding the statements online.


Maybe you live elsewhere, but in the US it's common for statements to be available for only seven years, and if you close an account, its online statements are gone for good. Bad marriages and good investments can last far longer than that.


I've always gone with the IRS advice that nothing older than 7 years is needed. Maybe someday I'll regret it but so far it hasn't happened.


Yeah, that's the right number, but the context is slightly incorrect. If in 2022 you claim a lowered capital gain based on a home renovation you did in 2001, then you need to save those 2001 receipts for seven years, i.e. until 2029. The date on the receipts doesn't matter; the seven-year clock starts ticking only when you involve them in a taxable event.


In the fraud case at tech company NS8[1] (for which the CEO was recently sentenced to prison), the CEO apparently edited PDF bank statements before sharing them with his CFO. I think most people naturally assume that a PDF is unalterable so a PDF document. While not commonly exploited, that assumption is a big security gap. We need a way to sign PDFs to ensure their authenticity.

[1]https://www.sec.gov/litigation/complaints/2020/comp24905.pdf


A way to sign PDFs would be great, but then we also need a way to verify the signatures. And verify that it was signed by whoever issued the document, not someone else who tampered with it.

Not really sure what the state of the art there actually is. Pessimistically I figure we're still at the stage where websites would put an image of a lock with a green checkmark on their website to make it look secure - i.e. really only just for show.



I'm really glad I get most of my documents digital right now. I can download them, sort them into folders on my home server, and I'm happy as I could be. If I need a file or invoice, I simply open my server's folder structure and grab the one I need.


Hi, Philipp here (one of the creators) Haha, this is so funny, this is one of the BIG reasons why we built this in the first place


:)

Do they pay you for this?


No, but would be cool if they used it


I would advise you to deliver your incredible goodwill to a non-profit organization which actually values it instead of working on something for banks :)

The rotten, soulless entities that is banks do not deserve ANY free work.

They don't care about you, they won't value you work, they won't give anything back.

They'll use it to maximize their profits at best.

But most likely, they won't do anything: With absolute certainty, they KNOW that PDFs can be signed. They have to deal with cryptography anyway, and have security consulting.

They very likely intentionally decided to not sign PDFs anyway just because they can get away with it without getting sued, and save money by that.


They very likely hired an external firm to write their PDF export, and the person implementing it was just trying to finish the contract on time and to spec.

The overhead of maintaining a properly secured PKI key and implementing signing of generated PDFs with it is nonzero.

Banks aren't always rotten and soulless, they are cold and lazy machines that do the bare minimum that their customers ask.

This project is great, let's spread awareness that PDFs _can_ keep an internal digital signature and maybe someday their customers will demand it.


It's not only PDF's, email sent by banks should be both DKIM and S/MIME signed but we barely get the former.


> Nobody pays me for that wasted time of my life.

Would you say the same for self checkout at the grocery store?

What about burger bars where you have to put your own toppings on?


> the grocery store

Funny enough, the only digitally signed email I've ever (knowingly) received came from Aldi. I sent them a question about food waste and the response showed up in Apple Mail with a badge and signature validation notice that I'd never seen before.


That's most likely BIMI with DKIM (DKIM alone is relatively common) but it's unfortunately not S/MIME. Latter would actually be a "sender signed email" rather than former, "domain signed email".


I went back and searched my email. It was an RSA-2048 S/MIME certificate issued by Aldi Süd and Apple Mail now warns that the certificate is expired (the email was from a few years ago, when the certificate was valid). The email came from a supply chain person in their Hong Kong office - maybe that explains the level of security?


Huh, that is very interesting (and rare). Also highlights one flaw of S/MIME, there isn't any validity (OCSP) stapling equivalent for it.


Yes. They're saving money hoping that the customer does the work for free!

What is your point anyway?

Do you really think I should be wasting half an hour to a full hour of my life every month to download a dozen of PDFs (remember, it's not only banks which want that) so big corporations can save like $5 on paper & postage?

Why would I want to work for below minimum wage for those people, for no tangible benefit to me?

(Paper is as easy to process as PDFs, and IMHO in fact easier to process:

You can fit multiple sheets on your desktop in parallel, you can shuffle it around, hold it next to each other for cross-referencing, you can write stuff onto it and be sure it will be readable in 10 years (might not be true for PDF annotation software!), the disk it's stored on won't die, your relatives can read it if you die, etc. If paper was a VR-product all these things would be advertised as great new VR features. In real life we get these 3D-features for free but their advantages are completely ignored when forcing the usage of computers for the sake of it.)


> so big corporations can save like $5 on paper & postage?

At their scale it's much. I honestly don't care about their costs, but at their scale that's tons of paper and gas that's totally wasted.

I don't know which country you're from, but all my bank statements, utility invoice etc go to my email, so there is no need to log anywhere. I'm not sure about others right now, but utilities are definitely signed. Maybe you could talk to your bank/... about it.


> At their scale it's much.

At their scale it is replaceable: They earn money to do their duties, if they cost a bit more they bill the customers a bit more so they get it back.

The time of my life is NOT replaceable. I do not get it back EVER.

And they very likely don't give a damn about their customers, it doesn't matter to them if some cronjob delivers PDFs or another cronjob prints letters - at the end of the day they just want to go home.

I do CARE about living, it gravely matters to me how much of my life I have available for myself.

> tons of paper and gas that's totally wasted.

It's not wasted: It fulfilled it's purpose of delivering information to me in a convenient fashion.

And once I'm done with it, it goes into the paper bin and gets recycled.

> I don't know which country you're from, but all my bank statements, utility invoice etc go to my email, so there is no need to log anywhere. I'm not sure about others right now, but utilities are definitely signed.

Every company here has a different method. Websites, emails which link to websites, emails which are the invoice, emails which have an attachement that is the invoice.

It is impossible to cleanly integrate this into one workflow.

A plain old regular paper mailbox however already is a clean, integrated workflow which ships unified pieces of paper which all have the same size and are able to be put into the same kind of folders thus.

Hence optimizing this to be more convenient for companies means taking away individual lifetime which matters to individuals for the sake of enriching entities which do not care about the money they saved, it's just a number in some database for them.

> Maybe you could talk to your bank/... about it.

Do you seriously believe they will do anything?

Whenever I interact with those kind of people, I rarely get an answer ever, and that's about things which are part of their daily duties.

If you go to them expecting them to actually do something out of their ordinary - good luck. It will get ignored with a 99% probability.

And even if one of them does something: Then the other dozen companies I have to deal with will not do anything.

So paper has to stay anyway. I'm happy with it. It's convenient, it's super standardized, and it just works.


Does your bank encrypt the PDFs? Some of mine do (Europe). Some combo of bank account and birthday to open them - they state this in the email - so trivial to open by others that know you, but not strangers. Better than nothing I guess.


They refuse to send emails because "emails are insecure", you HAVE to go to their website to download the PDFs.....

(And encryption is NOT a signature. Anyone who knows the password can forge an encrypted file with the same password. So the PDFs would still be worthless.)


Yeah I understand encryption isn’t an sig, just seemed like your main complaint (or second biggest complaint) was the pain in downloading them from many different places. Encrypted PDFs are one way to allow them to be emailed to you in a decently secure fashion. Sure nothing is 100% but don’t let perfect be the enemy of good.

As far as signatures in PDFs goes, when have you ever needed them? What’s the real world scenario? I think most recipients would be incapable of validating the signature properly (including verifying ownership of the public key corresponding to the private key that signed it). Ie same flaws of any Cert based system.


The signature would be needed when the bank says "you have no money" even though I should have, due to their ancient COBOL systems breaking, fraud, whatever.

How would I prove to a court that I do have money if I only have easily forge-able unsigned PDFs?

Now before you say this is unlikely, ask yourself this:

Would you lend thousands of dollars to a regular person without ANY signed document from them which says they owe you the money?

No, right?

Then why would I do that with a bank?

If they get that much money from me they ought to at least give me a kind-of unforgeable document which says the money is mine.


>And encryption is not a signature.

Encryption involves using signing key and universally uniquely identifying something.

That's exactly what it is. In fact encryption is even more secure than a normal written signature.

I can sign a piece of text put it here -- sign it with my private key -- put it on HN with my public key and everyone can be sure I wrote it.


Encryption usually involves a signing/authentification step to prevent certain crypto attacks, but in symmetric encryption schemes those only prove that the document hasn't been modified after encryption. You still create a different document, sign it with the same password, and nobody would be able to prove that that's not the original.

With asymetric encryption you have a sort of signature because only the sender has the encryption key, so forging somemthing that opens with the same decryption key is hard. But I have yet to see somebody encrypt pdfs with an asymetric method.


With "identity certificates" or "electronic IDs" used in parts of Europe, documents are indeed signed with asymmetric cryptography: a recipient of a document can't modify it and keep the signature valid.


> That's exactly what it is. In fact encryption is even more secure than a normal written signature.

No, certainly not.

The biggest issue is that you're conflating a human concept of a signature and the cryptographic one. This is obvious from your second paragraph.

> I can sign a piece of text put it here -- sign it with my private key -- put it on HN with my public key and everyone can be sure I wrote it.

Cryptographically maybe, legally no. We lack crucial information about who can use your keys, there's nothing that says you can't share a random keypair that has no legal backing. We also don't know if your keys are valid at all, maybe you're underaged? Do we know if your keys were valid during the time of signing, maybe you were underaged?

It's way more complex than Sign(text).


Are encrypted PDFs signed? AFAIK it's just symmetric encryption. You only need a password to decrypt, after all


> I wonder what happens if they lose your money due to bugs or even intentionally - will they then happily accuse you of forging the PDFs because they're unsigned?

> With paper that'd be not so trivial, in my country the paper often has some special format and the paper itself is of a special type, and it ages and you cannot easily guess the printer which was used.

Read a similar discussion recently. Even with paper you can prove your account balance at day X but if your bank lose your money at day X + n and you want it back they could still claim you withdraw all since day X and had an empty account at day of lose.


Fine, so the paper protects me for bogus withdrawals during a number of X days, it just doesn't protect me for the additional n.

The PDFs protect me for 0 days because they can claim I've faked them right from the beginning.

I'll take the paper :)


My paper statements do not come on a special paper either.


Hello from 3rd world country. It feels weird to read this and just download bank statement digitally signed as .pdf.asice format. Well, I can also use my id card to login to bank. Many people here don't realize how things are very well right here, at some very small country.


Sveiks :)

It's easy to get things perfectly when everything is new and the audience is small.


̶I̶f̶ ̶o̶n̶l̶y̶ ̶b̶a̶n̶k̶s̶ ̶k̶n̶e̶w̶ ̶t̶h̶i̶s̶ The one secret Banks hate! FTFY


I think that a big missing elephant is a tool which provides authenticated download.

What I mean is the following:

Let's say that I'm downloading PDF from mybank.com. Browser establishes TLS connection to the mybank.com, sends request, receives response PDF and then does something with response. This TLS connection could be serialized as it is with accompanied ephemereal keys. Those bytes include remote peer X509 certificate signed by digicert and the whole exchange is further cryptographically signed with corresponding key.

So basically you already have cryptographically signed PDF from your bank. You just don't have tools to save or verify this signature. And juridical framework to further act on those artifacts. But tech is deployed for 30+ years already.


No, what's tied to the certificate is only the identity of the endpoints, not the content being transmitted (since that's only protected by a symmetric key both sides know), so you can forge a "recording" of a HTTPS session for any file you want.

There have been proposals to extend TLS to have this capability, but to my knowledge none are really standardized or used anywhere.


Wouldn't an ability to "forge a recording of a(n) HTTP session for any file you want" mean that you can implement a transparent HTTPS/TLS proxy that allows MITM attacks?

To deliver any file over TLS, you need access to the private key part of the TLS certificate, and to decrypt it, you need the public part. Having a recording of unencrypted raw TLS session along with the public key part would allow decrypting the stream later, while making it impossible to forge any data not coming from the identity controling those private keys.

For proper verifiability in the future, you'd want more details from the certificate and CAs (like verifiable way to ensure certificate was not revoked at the time of download), but there is already an asymmetric encryption happening at the TLS level.

Now, you can still forge some parts by omission: if you've got streamable data (requiring no random seeks), you can cut off parts of documents while maintaining encryption: you still can't modify it otherwise.

What am I missing in your claim of forging a "recording" of an HTTPS session?


Asymmetric cryptography is only used in the TLS handshake. The TLS handshake ends with both parties having a shared symmetric key only they know. The handshake can't be completed without the server having access to its private key.

Everything after that only needs the symmetric key. That both sides know, and thus both sides can generate whatever they want to put into a recording, and you can't verify if it is correct or not. All a third party can verify given a detailed recording is "this client did access the bank web server and completed a handshake with it".


Thanks for the details: I am surprised I never came across those details before, but that's obviously on me :)


Thanks, this is unfortunate.


We thought about it from the other side, we want to make it easy that just every PDF gets signed. We know it is not perfect, but it is better than not signatures at all. Inspired by LetsEncrypt

https://github.com/open-pdf-sign/open-pdf-sign-configurator/...


It should be said that PDF signatures are a very fragile design, leading to a pletora of security issues: https://pdf-insecurity.org/

A core problem is that a pdf signature does not necessarily cover a complete file, but can be a partial signature. This adds a whole lot of complexity and unclarity around what is actually signed, allowing all kinds of attacks. I feel this is all so problematic that if you want to sign PDFs it's probably better to not use PDF signatures, but some form of outside signatures over the whole file.


The problem is mostly (1) signature validation implementations that don’t perform all necessary checks, and (2) PDF viewers with signature validation that don’t properly guide the user to view the actually signed document revision (when it is not the most recent one).

Regarding (1), this is because there exists no complete specification of what an implementation needs to check with regard to the PDF format. Implementors have to figure it out on their own, and thus there have been some gaps.


It is in theory possible to craft better-signed PDFs that would pass the European eIDAS Qualified Electronic Signature requirements.

But in general the ASiC-E container format is more versatile and also more robust against potential flaws.


In practice, though, PAdES has a lot more support and has the crucial property of being easy to view by end-users.

Is there any wide use of ASiC?


> In practice, though, PAdES has a lot more support and has the crucial property of being easy to view by end-users.

For now, I'd expect ever increasing compliance with eIDAS in the future. PDFz are also not the only thing that people want to sign, that's where an agnostic container format has its benefits.

> Is there any wide use of ASiC?

I know that at least Estonia, Latvia, Lithuania and Finland have deployed it. Of those Estonia probably has the widest and longest use of it, as they migrated *to* ASiC-E, having used the predecessors BDOC and CDOC previously.


Can I use this with smart cards on Linux? And, if so, can someone start wiring it into the various viewers?

That's my one big hold-up from going full Linux: I absolutely must be able to sign documents using a cert held on a smartcard.


Very nice! I've also implement a similar project that can be used to not only sign documents but also check the validity of signed documents through a simple API: https://github.com/spapas/pdf-sign-check

We use it for many years in a public sector organisation to make sure that our internal documents are properly signed.


Nice, some years ago I worked on the Apache PDFBox code that powers this. Great to see people build on top of it.


Great solution, if we didn't have to install a JRE on a server.



Hi, Thomas (one of the creators) here. This is actually the reason, why we are still supporting JRE8 with open-pdf-sign instead of having a JRE11 (or later) baseline. We are offering a npm module as well (https://github.com/open-pdf-sign/open-pdf-sign-node). While that does not get rid of the JRE requirement, it makes integration in "modern" backends easier.


Signing is great until you have to rotate your keys.


Would you mind explaining? As I understand it, signing works better if you rotate your keys regularly.


The public key is publicly available so the signature can be verified. But when you rotate keys, what do you do? Post a list of formerly valid public keys? Are all public keys derived from one master/root key? And then you don't rotate the master? So then the rule is rotate "almost all" your keys. But then that rule goes out the window of master/root key is compromised.


That's pretty much how it works, at least in GPG world. You generally never rotate the top-level certifying key, and you use that only for certifying.

All that said, "that's how GPG does it" is usually a strong argument against a proposal.


There needs to be a time-dependent set of trust anchors, such as the European Trusted Lists standard. It’s not completely trivial, but the general architecture exists.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: