If you mean the "If you don't know" part, oh no, they have a much bigger problem they're solving.
The LLM will absolutely lie if it doesn't know and you haven't made it perfectly clear that you'd rather it did not do that.
LLMs seem to be trying to give answers that make you happy. A good lie will make you happy. Unless it understands that you will not be happy with a lie.
Is this anthropomorphizing? Yep. But that's the best way I've found to reason about them.
A less anthropomorphic approach might be to say that LLMs can predict the correct “shape” of an answer even when they don’t have data that gives them a clear right answer for the correct content, and since their basic design is to provide the best response they can, they’ll provide an answer of the correct shape with fairly random content if all they have good information to predict is the shape and not the content.
I think parent has hit on the how and GP has hit on the why.
How LLMs are able to give convincing wrong answers: they “can predict the correct ‘shape’ of an answer” (parent).
Why LLMs are able to give convincing wrong answers is a little more complicated, but basically it’s because the model is tuned by human feedback. The reinforcement learning from human feedback (RLHF) that is used to tune LLM products like ChatGPT is a system based on human ranking. It’s a matter of getting exactly what you ask for.
If you tune a model by having humans rank the outputs, despite your best efforts to instruct the humans to be dispassionate and select which outputs are most convincing/best/most informative, I think what you’ll get is a bias towards answers humans like. Not every human will know every answer, so sometimes they’ll select one that’s wrong but likable. And that’s what’s used to tune the model.
You might be able to improve this with curated training data (maybe something a little more robust than having graders grade each other). I don’t know if it’s entirely fixable though.
The brilliant thing about the parent’s comment about the “shape” of the answer is that it reveals how much humans have (uh, historically, now, I guess) relied on the shape of information to convey its trustworthiness. Expand the notion of “shape” a bit to include the medium. If somebody bothered to take the time to correctly shape an answer, we take that as a sign of trustworthiness, like how you might trust something written in a carefully-typeset book more than this comment.
Surely no one would take the time to write a whole book on a topic they know nothing about. Implies books are trustworthy. Look at all the effort that went in. Proof of effort. When perfectly-shaped answers in exactly the form you expected are presented in a friendly way and commercial context, they certainly read as trustworthy as Campbell’s soup cans. But LLMs can generate books worth of nonsense in exactly the right shapes without effort, so we as readers can no longer use the shape of an answer to hint at its trustworthiness.
So maybe the answer is just to train on books only, because they are the highest quality source of training data. And carefully select and accredit the tuning data, so the model only knows the truth. It’s a data problem, not a model problem
> The brilliant thing about the parent’s comment about the “shape” of the answer is that it reveals how much humans have (uh, historically, now, I guess) relied on the shape of information to convey its trustworthiness.
This is the basis of Rumor. If you tell a story about someone that is entirely false but sounds like something they're already suspected of or known to do, people will generally believe it without verification since the "shape" of the story fits people's expectations of the subject.
To date I've decried the choice of "hallucination" instead of "lies" for false LLM output, but it now seems clear to me that LLMs are a literal rumor mill.
What's the point of the technology if it will provide an answer regardless of the accuracy? And what prevents this from being dangerous when the factual and ficticious answers are indistinguishable?
We have the same problem with people. Somehow, we've managed to build a civilization that can, occasionally, fly people to the Moon and get them back.
Even if LLMs never get any more reliable than your average human, they're still valuable because they know much more than any single human ever could, run faster, only eat electricity, and can be scaled up without all kinds of nasty social and political problems. That's huge on its own.
Or, put another way, LLMs are kind of an concentrated digital extract of human cognitive capacity, without consciousness or personhood.
"without all kinds of nasty social and political problems"
I assure you, those still exist in AI. AI follows whatever political dogma it is trained on, regardless of if you point out how logically flawed it is.
If it is trained to say 1+1=3, then no matter what proofs you provide, it will not budge.
Yes, it could be dangerous if you blindly rely on its reliability for something safety-related. But many creative processes are unreliable. For example, coming up with bad ideas while brainstorming is pretty harmless if nobody misunderstands it.
Generally, you want some external way of verifying that you have something useful. Sometimes that happens naturally. Ask a chatbot to recommend a paper to read and then search for it, and you’ll find out pretty quick if it doesn’t exist.
What happens when the tech isn't only being used to answer a human's questions during a shortlived conversation though?
The common case we see publicized today is people poking around with prompts, but isn't it more likely, or at least a risk, that mass adoption will look more like AI running as longlived processes talked with managing done system on their own?
> The common case we see publicized today is people poking around with prompts, but isn't it more likely, or at least a risk, that mass adoption will look more like AI running as longlived processes talked with managing done system on their own?
If by “AI” you mean “bare GPT-style LLMs”, no, they can’t do that.
If you mean “systems consisting of LLMs being called in a loop by software which uses a prompt structure carefully designed and tested for the operating domain, and which has other safeguards on behavior, sure, that’s more probable.
One way to think about it, though, is that many important processes have a non-zero error rate. Particular those involving people. If you can put bounds on the error rate and recover from most errors, maybe you can live with it?
An assumption that error rates will remain stable is often pretty dubious, though.
Not if they're bad at it. ChatGPT and friends is a tool that's useful for some things and that's where it'll see adoption. Misuses if the technology will likely be exposed as such pretty quickly.
These are the 1-million dollar questions when it comes to LLMs. How useful is it to talk to a human who likes to talk, and prefers to say something over admiting they dont know? And if you have a person with münchhausensyndrome in your circles, how dangerous is it to listen to them and accidentally picking up a lie? LLMs with temp > 0.5 are effectively like these people.
I have the same concerns, but am feeling more comfortable about Munchausen-by-LLM not undermining Truth as long as answers are non-deterministic.
Think about it: 100 people ask Jeeves who won the space race. They would all get the same results.
100 people ask Google who won the space race. They'll all get the same results, but in different orders.
100 people ask ChatGPT who won the space race. All 100 get a different result.
The LLM itself just emulates the collective opinions of everyone in a bar, so it's not a credible source (and cannot be cited anyway). Any two of these people arguing their respective GPT-sourced opinions at trivia night are going to be forced to go to a more authoritative source to settle the dispute. This is no different than the status quo...
> What's the point of the technology if it will provide an answer regardless of the accuracy?
The purpose is to serve as a component of a system which also includes features, such as the prompt structure upthread, that mitigates the undesired behavior while keeping the useful behaviors.
I suggest using 'form' instead of 'shape'; the latter is mainly concerned with external form. In context of LLMs, form would be the internal mapping, and shape the decoded text that is emitted.
I think of it more like a pachinko machine. You put your question in the top, it bounces around through a bunch of biased obstacles, but intevitably it will come out somewhere at the bottom.
By telling it not to lie to you, you're biasing it toward a particular output in the event that its confidence is low. Otherwise, low confidence results just fall out somewhere mostly random.
> By telling it not to lie to you, you're biasing it toward a particular output in the event that its confidence is low.
This is something I really don't understand about LLMs. I think I understand how the generative side of them work, but "asking" it to not lie baffles me. LLMs require a massive corpus of text to train the model, how much of that text contains tokens that translate to "don't lie to me", and scores well enough to make its way into the output?
> Is this anthropomorphizing? Yep. But that's the best way I've found to reason about them.
My take? It's like a high-schooler being asked a question by the teacher and having to answer on the spot. If they studied the material well, they'll give a good and correct answer. If they (like me, more often than I'd care to admit) only half-listened to the lectures and maaaaybe skimmed some cliff's notes before class, they will give an answer too - one strung together out of few remembered (or misremembered) facts, an overall feel for the problem space (e.g. writing style, historical period, how people behave), with lots and lots of interpolation in between. Delivered confidently, it has more chance of avoiding a bad mark (or even scoring a good one) than flat-out saying, "I don't know".
Add to that some usual mistakes out of carelessness and... whatever it is that makes you forget a minus sign and realize it half a page of equations later - and you get GPT-4. It's giving answers like a person who just blurts out whatever thoughts pop into their head, without making a conscious attempt at shaping or interrogating them.
> Is this anthropomorphizing? Yep. But that's the best way I've found to reason about them.
I think it might be more accurate to say, "LLMs are writing a novel in which a very smart AI answers everyone's questions." If you were writing a sci fi novel with a brilliant AI, and you knew the answer to some question or other, you'd put in the right answer. But if you didn't know, you'd just make up something that sounded plausible.
Alternately, you can think of the problem as the AI taking an exam. If you get an exam question you're a bit fuzzy on, you don't just write "I don't know". You come up with the best answer you can given the scraps of information you do know. Maybe you'll guess right, and in any case you'll get some partial credit.
The first one ("writing a novel") is useful I think in contextualizing emotions expressed by LLMs. If you're writing a novel where some character expresses an emotion, you aren't experiencing that emotion. Nor is the LLM when they express emotions: they're just trying to complete the text -- i.e., write a good novel.
the incidence of "i don't know" in response to questions in the training data is pretty low if present at all, and even if it were you'd still need to frame those I don't know answers such that they apply to the entire dataset accurately. This is obviously a gargantuan undertaking that would not scale well as data is added, and so right now the idea or concept of not knowing something is not taught. At best you'd build a model that handles human language really well then retrieves information from a database and uses in context learning to answer questions, where a failure to find info results in an i don't know.
What is taught indirectly though is level of certainty, so if you get LLM's to rationalise their answers you tend to get more reliable evidence based answers.
Bottom line, teaching a monolithic model what it means to not know something with certainty, is difficult and not currently done. You'll likely get a lot of false negatives.