Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I mean, I'd totally try Tesseract[1], a few samples, and a python script. Shouldn't take more than 5 minutes to validate this.

Adobe also has the whole scan thing, and apple can — in some cases — correctly transcribe characters from images.

https://github.com/tesseract-ocr/tesseract



Tesseract out of the box is terrible for anything non standard. I tried using it for the comic books. Unusable. The training for your font is doable, but it's very time intensive (while the tools are pretty good!).


I'd say any of the language models are far better than Tesseract. I did some work in this space and it was an absolute nightmare, event working with pdfs.


For OCR of handwriting, I did some comparative analysis a year back, and I found that Tesseract was... not good. However TrOCR was okay, certainly the best of the FOSS solutions. But Textract from Amazon was the best one by far far for handwriting, though your mileage will vary


from my experience with tesseract ~1 year ago, it was frequently fucking up even with crispy PNG screenshots

I really doubt it can handle handwriting


Handwritten notes, cmon! Don't waste time on tesseract for that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: