Hacker News new | past | comments | ask | show | jobs | submit login

It would be interesting to see how Tesseract holds up against this.



Pingtype is my program for learning Chinese. I tried to use Tesseract to recognise some Chinese text, taken directly from a PDF that couldn't copy-paste for some reason. The results were awful.

I tried again with English text. I wanted a word list from a book that helps people learn English, so I took photos of the index. The format is word....page #, in two columns.

The results were just as bad.

I've given up on OCR, and decided I have to transcribe everything by hand. I only do it in my free time, and it's been taking months.

Is there any tool that can take a photo of a book where the pages curl towards the middle, and "flatten" it so that OCR will work better?


whats that? Thor :P ?


Hehe - well I was referring to the major os OCR lib. supposedly have LSTM-stuff in the next major rev https://github.com/tesseract-ocr/tesseract/blob/master/READM...


I have used tesseract and in my experience unless you train it for the particular type of text you want to recognize (font, background color, etc.) it will do quite poorly (including the recent lstm based versions). Would be great to see how it stacks up against these APIs though.


The supplied models are trained on document-like images, so I wouldn't expect it to do particularly well on things like street signs. My experience with the new lstm based versions is that it's very much competitive with closed source solutions for document-like OCR.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: