Hello HN, Talk2Arxiv is a small open-source RAG application I've been building for a few weeks. To use it just prepend any arxiv.org link with 'talk2' to load the paper into a responsive RAG chat application (e.g. www.arxiv.org/abs/1706.03762 -> www.talk2arxiv.org/abs/1706.03762).
All implementation details are in the GitHub. Currently, because I've opted to extract text from the PDF of the paper rather than reading the LaTeX source code (since I wanted to build a more generic PDF RAG in the process), it struggles with symbolic text / mathematics, and sometimes fails to retrieve the correct context. I appreciate any feedback, and hope people find it useful!
Currently, the backend PDF processing server is only single-threaded so if embedding takes a while please be patient!