Hacker News new | past | comments | ask | show | jobs | submit login

Almost garbage? This is the OCR result for the 2nd paragraph. Almost perfect, although the last word in each line gets joined to the first one in the next line:

"The fundamental problem of communication is that of reproducing atone point either exactly or approximately a message selected at anotherpoint. Frequently the messages have meamlng; that is they refer to or arecorrelated according to some system with certain physical or conceptualentities. These semantic aspects of communication are irrelevant to theengineering problem. The significant aspect is that the actual message isone selected from a set of possible messages. The system must be designedto operate for each possible selection, not just the one which will actuallybe chosen since this is unknown at the time of design."




I tried it with both ocrad and tesseract modes, and indeed, the ocrad mode produces garbage, the tessaract mode produces a really good result but takes a longer time doing it(mainly the time it takes to upload the entire thing and get the result back).

That seems to make sense to me, at least. Use ocrad mode by default, if it doesn't perform well, switch to tessaract and you'll hopefully get a better result.


When I did the test, it was garbage. Since your answer, I have repeated my test with results similar to yours.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: