I asked OpenAI’s ChatGPT some technical questions about Australian drug laws, li...

Spivak · on Feb 13, 2023

Y’all are using this tool very wrong and in a way that none of the AI integrated search engines will. You assume the AI doesn’t know anything about the query, provide it the knowledge from the search index and ask it to synthesize it.

That seed data is where the citations come from.

skissane · on Feb 13, 2023

There’s still the risk that if the search results it is given don’t contain the answer to the exact question you asked it, that it will hallucinate the answer.

Spivak · on Feb 13, 2023

10,000% true which is why AI can't replace a search engine, only compliment it. If you can't surface the documents that contain the answer then you'll only get garbage.

skissane · on Feb 13, 2023

Maybe we need an algorithm like this:

1. Search databases for documents relevant to query

2. Hand them to AI#1 which generates an answer based on the text of those documents and its background knowledge

3. Give both documents and answer to AI#2 which evaluates whether documents support answer

4. If “yes”, return answer to user. If “no”, go back to step 2 and try again

Each AI would be trained appropriately to perform its specialised task

vidarh · on Feb 14, 2023

A GAN approach to penalising a generator for generating something that is not supported by it's available data would be interesting (and I'm sure some have tried it already, I'm not following the field closely), but for many subjects creating training sets would be immensely hard (for some subjects you certainly could produce large synthetic training sets)

batch12 · on Feb 14, 2023

I've been working on something like this for fun. I know its not grownbreakimg, but it's an interesting problem.

timdavila · on Feb 13, 2023

You're holding it wrong!

Spivak · on Feb 13, 2023

Look I know that "user is holding it wrong" is a meme but this is a case where it's true. The fact that LLMs contain any factual knowledge is a side-effect. While it's fun to play with and see what it "knows" (and can actually be useful as a weird kind of search engine if you keep in mind it will just make stuff up) you don't build an AI search engine by just letting users query the model directly and call it a day.

You shove the most relevant results form your search index into the model as context and then ask it to answer questions from only the provided context.

Can you actually guarantee the model won't make stuff up even with that? Hell no but you'll do a lot better. And the game now becomes figuring out better context and validating that the response can be traced back to the source material.

stdgy · on Feb 13, 2023

The examples in the article seem to be making the point that even when the AI cites the correct context (ie: financial reports) it still produces completely hallucinated information.

So even if you were to white-list the context to train the engine against, it would still make up information because that's just what LLMs do. They make stuff up to fit certain patterns.

williamcotton · on Feb 13, 2023

That’s not correct. You don’t need to take my word for it. Go grab some complete baseball box scores and you can see that ChatGPT will reliably translate them into an entertaining English paragraph -length outline of the game.

This ability to translate is experimentally shown to be bound to the size of the LLM but it can reliably not synthesize information for lower complexity analytic prompts.

joe_the_user · on Feb 14, 2023

You don't build an AI search engine by just letting users query the model directly and call it a day.

Have you ever built an AI search engine? Neither have Google or MS yet. No one knows yet what the final search engine will be like.

However, we have every indication that all of the localization and extra training are fairly "thin" things like prompt engineering and maybe a script filtering things.

And given that despite ChatGPT's great popularity, the application is a monolithic text prediction machine and so it's hard to see what else could be done.

nl · on Feb 14, 2023

Who is this "you" you speak of when you say "you don't build an AI search engine by just letting users query the model directly and call it a day."

Because Microsoft might not have exactly done that, but it isn't far off it.