RLHF seems to suggest that human feedback to tune the model after plain textual data pretraining is quite potent per sample. There might be some optimal ratio of data+model size:rlhf size that works quite favorably for us in getting hallucinations to a minimum. Furthermore there might be some “there” there, in the hallucinations, that has yet to be identified as valuable in itself. Either way it seems like our ability to wrangle these models is getting better
How would you "self-validate" against hallucinated facts?
What makes self-validation possible are hard external rules that can be evaluated independently and automatically. Like the rules of Chess or Go.
We don't have anything like that for LLMs and what people want to use them for.