Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Also - how are PDFs exactly "discoverable"? I have petabytes of PDFs and making them easily "discoverable" for any mass use, such as analytics, search, or data analysis is a massive pain. I'd rather have them in a non-PDF format.


The author calling for new content to be authored as PDF, which can easily be made discoverable.

I’m guessing your data set is made of scans with poor or no OCR.


Not a single researcher or data analyst I know of would prefer "discoverable" content to be in PDF format, regardless of just how awesome the OCR is (which it often isn't, especially for tabular data). Even for all-text, non-tabular documents, OCR does not provide the metadata needed to make sense of the documents. Why PDF is claimed to have superior "discoverability" in the OP essay is a mystery to me. For the sake of "discoverability", PDF is definitely not the way to go.


The essay claimed

> PDFs are discoverable. Search engines index them as easily as any other format.

What you’re taking about has nothing to do with that.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: