Hacker News new | past | comments | ask | show | jobs | submit login

The OCR is what keeps me in Acrobat camp, too. And very sadly since this company is outright customer-hostile and the software is absurdly expensive.

I've been trying tesseract but find it lacking -- need a better shape/text/picture region recognition (people are working on it these days), and something which puts it back in tagged PDF form. I also want to try whatever Nuance/OmniSoft is selling these days since I used to be a OmniPage customer before.




I'm in the same boat. I keep a virtual machine with an up-to-date Acrobat Pro license just for the OCR. It's not that the OCR is state-of-the-art any more. It's not as accurate as ABBYY in my experience. But it still generates the most predictable, consistent bounding boxes for text selection out of the alternatives that I've tried.


Not sure how ABBYY FineReader is priced these days, but it was always giving me excellent results.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: