Also - how are PDFs exactly "discoverable"? I have petabytes of PDFs and making ...

relaxing · on July 19, 2021

The author calling for new content to be authored as PDF, which can easily be made discoverable.

I’m guessing your data set is made of scans with poor or no OCR.

rexreed · on July 19, 2021

Not a single researcher or data analyst I know of would prefer "discoverable" content to be in PDF format, regardless of just how awesome the OCR is (which it often isn't, especially for tabular data). Even for all-text, non-tabular documents, OCR does not provide the metadata needed to make sense of the documents. Why PDF is claimed to have superior "discoverability" in the OP essay is a mystery to me. For the sake of "discoverability", PDF is definitely not the way to go.

relaxing · on July 19, 2021

The essay claimed

> PDFs are discoverable. Search engines index them as easily as any other format.

What you’re taking about has nothing to do with that.