Hacker News new | past | comments | ask | show | jobs | submit login

I’ve heard this many times, also from good sources, but is there any gears level argument why?



This is still a hand-wavy argument, and I'm not fully in tune with the nuts-and-bolts of the implementations of these tools (both in terms of the LLM themselves and the infrastructure on top of it), but here is the intuition I have for explaining why these kinds of hallucinations are likely to be endemic:

Essentially, what these tools seem to be doing is a two-leveled approach. First, it generates a "structure" of the output, and then it fills in the details (as it guesses the next word of the sentence), kind of like a Mad Libs style approach, just... a lot lot smarter than Mad Libs. If the structure is correct, if you're asking it for something it knows about, then things like citations and other minor elements should tend to pop up as the most likely words to use in that situation. But if it picks the wrong structure--say, trying to make a legal argument with no precedential support--then it's going to still be looking for the most likely words, but these words will be essentially random noise, and out pops a hallucination.

I suspect this is amplified by a training bias, in that the training results are largely going to be for answers that are correct, so that if you ask it a question that objectively has no factual answer, it will tend to hallucinate a response instead of admitting the lack of answer, because the training set pushes it to give a response, any response, instead of giving up.


The training samples are at best self-referential, or alternatively referring to the unspoken expertise of whoever the sample came from (something the LLM is not privy to - it has it's own, different, aggregate set of knowledge).

For the model to predict "I don't know" as the continuation of (e.g. answer to) the input, that would have to be the most statistically likely response based on the training samples, but as we've noted the samples are referring to their originator, not to the aggregate knowledge of the training set/model.

Let's also note that LLMs deal in word statistics, not facts, and therefore "learning" something from one training sample does not trump a bunch of other samples professing ignorance about it - statistically a profession of ignorance is the best prediction.

If you wanted to change this, and have the LLM predict not only based on the individual training samples, but also sometimes based on an "introspective" assessment of its own knowledge (derived from the entire training set), then you would have to train it to do this, perhaps as a post-training step. But, think through in detail what it would take to do this ... How would you identify those cases where the model would have hallucinated a response and should be trained to output "I don't know" instead, and how would you identify those cases where a (statistically correct) prediction of ignorance should be trained to be overridden with a factual answer that was present in the training set?

It's really a very fundamental problem. Prediction is the basis of intelligence, but LLMs are predicting the wrong thing - word statistics. What you need for animal/human intelligence is to have the model predict facts/reality instead - as determined by continual learning and the feedback received from reality.


The current training strategies for LLMs do not also simultaneously build knowledge databases for reference by some external system. It would have to take place outside of inference. The "knowledge" itself is just the connections between the tokens.

There is no way to tell you whether or not a trained model knows something, and not a single organization publishing this work is formally verifying falsifiable, objective training data.

It doesn't exist. Anything you're otherwise told is just another stage of inference on some first phase of output. This is also the basic architecture for reasoning models. They're just applying inference recursively on output.


Well - it does not need to 'know' anything - it just needs to generate the string "I don't know" when it does not have better connections.


What does better connections in this context mean? To begin ranking the quality of connections, or "betterness", don't you need something approximating knowledge?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: