Can I ask how you parse PDFs? I'm curious both in terms of reading the PDF data ... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

dataflow on May 19, 2020 | parent | context | favorite | on: Simple Personal Finance Tracking with GnuCash

Can I ask how you parse PDFs? I'm curious both in terms of reading the PDF data (Python library?) and parsing it (regex?)... and do you have to deal with OCR as well?

haberman on May 21, 2020 [–]

I use "pdftotext -layout" and then parse that. Here is some more info from people who have tried this approach:

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

dataflow on May 21, 2020 | [–]

Thanks!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact