Hacker News new | past | comments | ask | show | jobs | submit login
PDF ChatBot – Upload, chat and interact with any PDF document (askyourpdf.com)
103 points by armcat on April 3, 2023 | hide | past | favorite | 82 comments



Just install Edge, sidebar Bing Chat with your PDF opened, ask your questions. You're welcome. https://twitter.com/sergeykarayev/status/1640764492018765824


Sounds like a data protection nightmare.

Suddenly people don't seem to have any problems uploading their data to big companies.

What was Google criticised for? Not creating fancy responses from your data?


How is this loading in PDF page content? I think its just ripping text off the document?

It looks like it works well on PDF with a digital text layer. I tried it on an image of the Declaration of Independence and it told me "I'm Sorry but the webpage is empty..."


Clunky, at least on Linux.


PDF chatbots are the new equivalent of todo apps... https://custombot.ai/. No doubt cool but loses the lustre after you've seen the umpteenth one that passes on the token cost to you and still hallucinates.


Has anyone ever had a problem they needed to solve by asking their PDF a question?


Yes - I write cyber security contracts. I feed it proposals i find from solicitation aggregators and it's suprisingly accurate at answering the below questions about 150 page pdfs.

*Granted, I don't use this service. I use llama index and AWS to store the vectorized prompt files.

Whats the contracting officers email address where the proposal needs to be submitted? What is the Proposal Due Date? What's the full Agency Name? What's a shorter way to say the agency name? What is the RFP number? What type of project are we responding to? Is it a proposal or just a request for information/ Sources Sought Notice? If it is a proposal, simply say proposal. If it is an Request for information or sources sought, say RFI. Are there any vendor upcoming vendor meetings? What is the CALENDAR OF EVENTS? Project Timelines, or deliverables? Can the response be submitted VIA Email? Or is it a sealed bid? What is a description of the scope of work, Scope Of Services or Performance Work Statement? IS the requirement for a Penetration Testing, or Audit Software? How Many Targets are there? Targets consist of Enterprise Systems (Databases, Cloud Enviroments, applications, servers, IP Addresses, Firewalls, network attached storage, and wireless networks ) and Enterprise infrastructure(Switchgears, routers, Modems, HUBS). List the total number of targets by each category. Does it require Social Engineering, Physical Security, Wireless Scanning or Perimieter Security?


Is there something that will allow me to run this locally?! This is exactly what I want to do but no clue how to pipe my data into llama. Any pointers will be highly appreciated!


Locally? It's not really possible to run the model locally, but finetuning on directories is possible.

Here's the code:

It's runs on three different directorys to give me three different 'answers' using an excel sheet to pull the q's from.

   def excelGPT(self, dir, excel_file, sheet):
        #my GPT Key
        os.environ['OPENAI_API_KEY'] = 'sk-
        #Working Directory for training
        # root = 
        root_folder1 = 
        documents1 = SimpleDirectoryReader(root_folder1).load_data()
        index1 = GPTSimpleVectorIndex(documents1)

        root_folder2 = 
        documents2 = SimpleDirectoryReader(root_folder2).load_data()
        index2 = GPTSimpleVectorIndex(documents2)

        root_folder3 = 
        documents3 = SimpleDirectoryReader(root_folder3).load_data()
        index3 = GPTSimpleVectorIndex(documents3)

        file_name = dir + excel_file
        

        df = pd.read_excel(file_name, sheet_name=sheet)

        GSA_answer_array = []
        basic_answer_array = []
        QA_answer_array = []
        df_series = df.iloc[:,0]
        
        for i,x in enumerate(df_series):
            print("This is the index ", i)
            print(x)
            GSA_response = index1.query(x)
            basic_response = index2.query(x)
            QA_response = index3.query(x)
            GSA_answer_array.append(str(GSA_response))
            basic_answer_array.append(str(basic_response))
            QA_answer_array.append(str(QA_response))

        self.zip_to_docv2(dir, "Gippie_Response.docx", df_series, GSA_answer_array, basic_answer_array, QA_answer_array)


This is awesome! Thank you for the code!


This, so much this!

There are so many projects that claim to do this, but end up piping data to OpenAI.

Can someone who has managed to get this set up locally send some pointers our way?


Terms of service for various companies and other long and boring documents.

My real estate agent wanted me to sign up a document that is 10 pages long. I would prefer to use the bot to answer my questions, and possibly - verify with other legal things.

Tried the document with the service (after removing personal info), and it worked so-so. Could specify which paragraphs mention the commission, but couldn't extract info about how high the commission is.

Perhaps it's because the document is in Polish. But GPT-3.5 or 4 shouldn't have a problem with such queries.


> Terms of service for various companies and other long and boring documents.

OK, fine. Do you have a working example of this? e.g. he's a contract, and please find me the unfavorable and / or non-standard terms. People have tried this before with no success, and it would be great if someone finally made some headway here. Even more points if the GPT things find onerous terms, but says, "hey don't worry about this non-compete bit, it's not enforceable."


Would using a service like this for legal advice not be one of the more hazardous use cases?


It's a low-stakes scenario, and the alternative was reading it myself - since paying legal fees in this specific case would be prohibitively expensive.


Understandably but my thinking is that when dealing with real estate you are paying for expertise navigating a--typically--large transaction for most people that wouldn't want to rely on an LMM that was trained on a dataset that might not have included the latest laws.


Could be useful for a PDF textbook, perhaps?


Yep. I've been using a similiar service (chatpdf.com) and uploading amateur radio mathematics pdfs and portions of the arrl handbook. I can't quite upload the entire thing due to size constraints, but I've been able to do quite a bit of Q&A from the book chapters.

This Ask your Pdf allows up to 20MB for free vs chatpdf's 10MB free; though chatpdf has a 32MB allowance on the paid plain. Not sure how how ask your pdf plans to monetize this.

I'm personally looking into setting up my own self-hosted "chatpdf/askyourpdf" clone so I can put put a whole bunch of my reference material in there. I can't actually open it up as a service because of the copyrighted works, but I would really love to have ham Q&A site based on the ARRL Handbook and other resources. Even expand that out to an electronics Q&A.

There's a site called llamahub.ai that lets you load lots of your own resources into a LLM index so you can train GPT (or potentially the opensourceish gpt4all variant) on your own resources.


Also want to jump in with https://libraria.dev/ for PDFs too, I just recently updated this with large PDF uploads, you can share your chatbots, have it in different views, and more


Students and scholars alike love it.


I looked up the stat's for crime in my city yesterday.

The police department had a big pdf basically filled with tables. Other formats were not immediately obvious or available.

Asking it to convert it to a spreadsheet would be neat.

Asking it to extract just the locations of interest would be better.


I could see it for instruction manuals.

Upload CheapThermostatUserGuide.PDF and ask, "How do I set the clock?"


PDFs sure are annoying when you want to quickly jump through the docs… but really I wonder how this will do once gpt4 api supports images … maybe then it can help me understand electronic data sheets… cause I’m still trying to figure out was pin 0 the sdl or sda pin… and was vcc 3.3 or 1.8 volts…


„What do I need to learn to understand you?”


You'll be suprised.


Almost all of them do the exact same thing and it is completely saturated with these websites looking very similar akin to a copy and paste job.

There is nothing new or unique about any of them other than a new AI snake-oil to push their new grift on to users uploading sensitive PDFs to 'chat' with their document as 'the future'.

Another race to the bottom until Microsoft Word or Google Docs releases the exact same thing for free and unlimited tokens.


Interesting, but how many people are going to upload things they really shouldn't?

...

You retain ownership of any PDF documents you upload to AskYourPdf. By uploading PDF documents to AskYourPdf, you grant AskYourPdf a non-exclusive, worldwide, royalty-free license to use, modify, reproduce, and distribute the PDF documents for the purpose of providing the AskYourPdf web application

...


Since you mention it, I have heard several of my friends and colleagues saying that they think ChatGPT could be their lawyer, doctor, and tax prep advisor if only they could send it documents for review.


> purpose of providing the AskYourPdf web application

For the purpose of funding it as a free service by selling upload content or derived metrics :)


Yikes.


That is the legal jargon required for them to ingest, index, and display the PDF you upload back to you.


It is legal jargon that gives them the right to do that, but it also gives them a lot of other rights. If they only wanted to display the PDF back to you, they could affect that meaning very easily.


> they could affect that meaning very easily.

Not using phrasing thst has already been tested in court is easy, but fraught. If someone sues you because of a reasonable thing you did to display a document and you have this phrasing. It's open and shut because someone else has already litigated it and so there's legal precedent. If you use different phrasing and someone sues you, there's a greater chance you'll have an actual drawn out court case to convince a judge that your phrasing means what you wanted it to mean. Remember, the meaning of words and phrases in a legal context can differ almost arbitrarily from what they mean in a conversational one.

As a business owner that just wants to get on and provide a service that displays a pdf you got sent, which do you go with, the one that lets your resources go to providing the service you intend to provide, or the one where there's a greater chance your resources will get tied up in a legal battle for the sake of making the terms almost no-one reads anyway a little nicer?


bad actors are not gonna be stopped by their own "legal jargon"... the terms look like copy pasta or AI generated themselves. can't imagine the operators spent much time reading them.

though maybe true bad actors would try harder to pretend being a company with some humans involved, rather than this openly anonymous site.


Everybody upload your tax documents and start asking questions! /s


Doesn't seem to work well for music scores :(

Tangentially, I haven't been able to find any software which has reliable OCR for music scores; they tend to be just bad enough as to be useless. Was curious if any recent AI developments could be applied to this, but don't have the expertise to look into this myself. If anyone has any thoughts or wants to look into this, please feel free to email me! (link to my website in profile, which has my email)


We just launched a beta music scanning feature at Soundslice a few months ago:

https://www.soundslice.com/sheet-music-scanner/

It’ll accept images and PDFs of music, extracting the notes, rhythms, etc., so you can play it back and edit it with our built-in editor.

It uses machine learning and works significantly better than the other products on the market. There’s a bunch it doesn’t do yet, but it’s useful enough already that we launched in public beta.


Have you tried Audiveris?

https://audiveris.github.io/audiveris/_pages/handbook/

I haven’t tried it but my first thought was to use something like tesseract OCR, and I found this optical music recognition (OMR) project from there.


I have tried Audiveris and unfortunately did not have any luck with it :/


Nice, I made one too:

https://docalysis.com/

My take on this space is that it'll eventually be built into the operating system or PDF viewers, so you're going to have to do more than just "chat with a PDF" -- but that chatting with PDFs is a great place to get started!


Getting ChatGPT (or any good-enough LLM) to generate/manipulate/edit/find discrepancies with PDFs would be great too. Probably best done with a plugin so it can execute code though. Dumping the PDF to HTML, telling ChatGPT to edit the HTML, then converting it back to a PDF is probably a non-starter although that does work on a basic level.

Anything ChatGPT<->PDF is probably a good business idea IMO. That stuff comes down to developers so often that it's almost a career specialization and PDF code can be unfun and tedious to write and maintain.


How does this work? Do you first scrape the PDF or do you have gpt4 multimodal access? The privacy policy link is broken at the moment so I can’t tell for sure


I can't answer for theirs but I made one too:

https://docalysis.com/

The way it works is you first parse the PDF to analyze its text, then use a LLM along with the relevant text when answering user questions.


Can you elaborate on how you parse the PDF? Are you simply converting it to text using a python library or something more robust like GROBID[1]?

1: https://github.com/kermitt2/grobid


Do you know of anything that can process engineering drawings and diagrams by looking how lines link text and other objects?


Not the OP but I that's what I do.


You just upload your pdf doc directly or via a url and you are good to go.


I think the GP was interested in technical details. Specifically, do you first scrape the pdf (using another tool presumably), or do you have gpt4 multimodal access?


I'm the founder of AskYourPdf. The released version is at its most basic level, we have some exciting new features coming!

AskYourPdf is significantly faster than other similar products around. Interestingly, we'll also be launching our API soon.

Take note, AskYourPdf is multilingual, completely free to use and you don't need to sign up.


This is a great application. My example use case was to upload a REIT quarterly report (in Spanish) and ask questions that I am interested in about it.

This is the report I uploaded: https://funo.mx/site_media/uploads/documentos/documento-4VK6...

I felt that at this point, it was more the "potential" than what it actually did. I asked some questions that it just couldn't answer. Also, at some point it started answering in English even though all my conversation with it had been in Spanish.

Excited to see how your service progresses!


I think this could be very useful! But I guess there are still some hallucinations to deal with: I uploaded an arxiv paper and asked who the authors were, and it it hallucinated 4/4 (and stopped working when I asked it to please use the paper as a reference to answer my question).


I had an idea to do something like this combined with something like Zeal or DevDocs so you can have a kind of chatGPT localised just to your specific language or framework. But I guess this does just that job, but in a far more general way


Yea AskYourPdf is also multilingual.


Just wanted to share my project too. I made a tutorial on how you can build the same thing here:

https://docs.dopplerai.com/quick-start


I just had the misfortune of having to wade through pages and pages of byzantine building code where the relevant parts were scattered though the document. This would be very helpful for that kind of thing.


I'd be interested in how people parse text off a PDF. I'm making a TTS tool to convert documents (mainly HTML docs at the moment) to speech, PDF's would be a great addition.


How I dealt with it was by first converting to HTML using PDF Box. Extracting text from there is then pretty simple.


Apache Tika can extract text (and metadata) from pretty much any file format ever invented.


Thanks! I'll give this a shot. I wonder if it properly parses into real HTML or it's just div's all the way down.


use pdftotext with the -bbox option to get the bounding box co-ords for each bit of text


I feel like there is an arbitrage opportunity for someone willing to gather up all these sites and use them to facilitate a discounted GPT API...


I’ve tried two similar services and for some reason both of them cap at 200pages. What’s your limit?


Looks like 200 pages :(.


Try unlimited pages at chatbotkit.com - there is 1,000,000 cap on the tokens though. Still more than all other services.


This one is not very good, it seems not anywhere near even GPT 3.5.


I'm guessing langchain / llamaindex + openai API?


lol - Yeah, I could write this in 20 lines.

I have it answering a list of questions in an excel from three different 'Frames of mind' by passing three different directory'swith different content in each to get three different responses i can craft together.

   def excelGPT(self, dir, excel_file, sheet):
        #my GPT Key
        os.environ['OPENAI_API_KEY'] = 'sk-
        #Working Directory for training
        # root = 
        root_folder1 = 
        documents1 = SimpleDirectoryReader(root_folder1).load_data()
        index1 = GPTSimpleVectorIndex(documents1)

        root_folder2 = 
        documents2 = SimpleDirectoryReader(root_folder2).load_data()
        index2 = GPTSimpleVectorIndex(documents2)

        root_folder3 = 
        documents3 = SimpleDirectoryReader(root_folder3).load_data()
        index3 = GPTSimpleVectorIndex(documents3)

        file_name = dir + excel_file
        

        df = pd.read_excel(file_name, sheet_name=sheet)

        GSA_answer_array = []
        basic_answer_array = []
        QA_answer_array = []
        df_series = df.iloc[:,0]
        
        for i,x in enumerate(df_series):
            print("This is the index ", i)
            print(x)
            GSA_response = index1.query(x)
            basic_response = index2.query(x)
            QA_response = index3.query(x)
            GSA_answer_array.append(str(GSA_response))
            basic_answer_array.append(str(basic_response))
            QA_answer_array.append(str(QA_response))

        self.zip_to_docv2(dir, "Gippie_Response.docx", df_series, GSA_answer_array, basic_answer_array, QA_answer_array)


It’s a great mystery why people rave about Langchain library. I found it to be really bloated and verbose syntax makes it hard to understand what it’s doing unless you are digging into the code base. Poorly documented, too


What alternatives exist today? Honestly, I think langchain just fills a void for having a more streamlined "api" for LLM-driven app workflows which is why it's hyped. Plus the fact that they have bindings for both JS and Python makes it easy to get up and running and building custom "agents" for different tasks.


Have you tried Deepset Haystack?


Does it fine tune the model or just learn like a plug-in?


Kinda like have the author at your fingertips


why would you want to chat and interact with a PDF document?


Think of anything ever written, your personal library, the tax code, every up to date doc for your project and it's dependencies, your code base, your university texts...

Index them, store them in LLM format.

Then ask a question, you first semantically search all the relevant sources you've indexed, and get back a tight set of under the token limit results that you then pass on to your favorite LLM. Chat4all, ChatGPT etc then read those parts of your library and answer your question.


Are you really going to leak your tax documents to ChatGPT? I had just joked about this in a comment above.


Maybe it's time to write a ChatGPT site that does fancy thinks with your credit card data if you upload your number and security code.

People seem to completely forget the potential of abuse of their data.

Maybe the nigerian scammers should switch from being a prince to being a new ChatGPT based service


No, I'm working on wiring it up to one of the local LLMs like ChatGPT4all or vicuna.


Please do share when you have something working locally!


this comment blows my mind...

My head bursted with ideas when i first found out i could vectorize directorys and answer questions on them. I can run entire logic loops using an LLM as the input... How does that not blow your mind? '

Imagine scouring financial reports in real time?

Imagine being able to analyze thousand page regulations for self interest?

Imagine being able to interact with old newpapers, articles, and media lost to time.

Ask questions on entire class of books, and any information you want to aggregate...


(((:::)))


Since I can't delete on Hacker News, this comment was one of several for an April Fool's joke related to AGI "taking over my account." I didn't realize there was a hidden meaning associated with triple parentheses. I apologize!


There was no joke, the only thing you did was spam




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: