PaddleOCR seemed to be a good library for locating and translating text. I've been puzzling over how to translate something like a simple letter form into a LLM translatable format.
I think the serious problem is most of these LLMs are already built on-top of garbage so you're already the GI and just trying to match that as best you can.
I built a library around this problem [1]. I recently did some experimenting with PaddleOCR but found the results very underwhelming (no spacing between text) - seems like it's heavily optimized for Chinese. There was a 3 year old GitHub issue around it and seems like it still has this issue out of the box. I'd be curious to hear other people's experience with it.
I think the serious problem is most of these LLMs are already built on-top of garbage so you're already the GI and just trying to match that as best you can.