Also - how are PDFs exactly "discoverable"? I have petabytes of PDFs and making them easily "discoverable" for any mass use, such as analytics, search, or data analysis is a massive pain. I'd rather have them in a non-PDF format.
Not a single researcher or data analyst I know of would prefer "discoverable" content to be in PDF format, regardless of just how awesome the OCR is (which it often isn't, especially for tabular data). Even for all-text, non-tabular documents, OCR does not provide the metadata needed to make sense of the documents. Why PDF is claimed to have superior "discoverability" in the OP essay is a mystery to me. For the sake of "discoverability", PDF is definitely not the way to go.