Why would someone only ask an LLM questions when they were in the market to buy ...

anileated · on Feb 6, 2023

> 4) They ask the LLM for a list of recent books that go in depth on the topic or are in the genre etc. 5) Your name comes up in the list

My belief is that ChatGPT is actually not quite capable of that, after seeing examples of how it manufactures non-existing references. Besides, if it were capable of that, why would it not show your name as part of the answer already now?

The cynic in me thinks it’s not capable of that primarily because it is not a priority for OpenAI and training data strips attribution, with an explicit purpose: if the public knows that ChatGPT can trace back the source, OpenAI would be on the hook for paying all the countless non-consensual content providers on which work it makes money.

We should treat OpenAI as we treat Google and Microsoft. It has great talent and charismatic people working for it, but ultimately it’s a for-profit tech company and the name they chose ought to make us all the more suspicious (akin to Google’s “don’t be evil”).

> Why would someone only ask an LLM questions when they were in the market to buy a book?

Why would you be in a market for a book when you can learn the same and more by asking an LLM that already consumed said book? And therefore why would the author spend effort writing and publishing a book knowing it’d sell exactly one copy (to LLM operator)?

pharke · on Feb 6, 2023

It's very much in their interest, if the information their models provide is impossible to verify then it severely limits its uses. You essentially can't use it as a source for anything that requires any type of citation or reliability. That's a huge handicap for selling it to businesses and researchers. The general problem of determining what training data was used to produce an output is an open problem in ML and one that is being very actively worked on since it would greatly further the field.

You believe correctly that ChatGPT is not capable of showing sources, it's currently impossible to do but we were discussing Tomorrow so I included it as a possibility. You could potentially hack it in now using traditional search or nearest neighbours but it wouldn't be 100% accurate, probably not even 50%, it would just show a bag of similar texts so not really worth doing.

I'd still be in the market for a book even if we had a perfect LLM that could answer every question I had with impeccable accuracy. I read books because I want to find out about things I don't know that I don't know. It's pretty hard to find those things if you just do question response. It's like a graph, if you start at one node it may take you a very long time to traverse the graph to another node but if you have some outside source that gives you the address of a new node you can just jump straight to it.