Hacker News new | past | comments | ask | show | jobs | submit login

I have 85,000 PDF documents, collected over a few decades.

What I really want is a semantic interface to those PDF documents. Find me "all PDF files which mention <subject>", or "show me any PDF with python example code", or "all PDF's before 2011 on the subject of coding standards for SIL-4".

I keep thinking this is out there somewhere, but whenever something new comes along I get bogged down in the details of setting it up. Surely someone has come up with an AI that you can just 'give the folder to' and it figures things out automagically?




Have you tried Paperless NGX?


No I haven't, so thanks for recommending it to me - looks pretty detailed. I will try it out some time this week, maybe its exactly what I'm looking for. Thanks again!


You can do this locally with your favourite LLM and Open WebUI: https://github.com/open-webui/open-webui


Looks like I've got a few days of hacking ahead of me, thanks for the recommendation - will put it alongside the other suggestions and check it out when I do my "PDF sortout workbench" session ..


This is what I use for that

https://github.com/simon987/sist2


Looks pretty functional, if not entirely polished - I will try this out (alongside Paperless NGX, also suggested here..) - I appreciate the recommendation, thank you!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: