I assumed OpenAI had added that themselves.

startupsfail · on June 22, 2023

They’ve added incorrect answers?

wizzwizz4 · on June 23, 2023

No; those were there already. I assumed they'd added the "does not double down" behaviour. Bing Chat (also an OpenAI model, but presumably with a different set of fine-tuning / filters to ChatGPT – possibly the bare OpenAI API) doubles down in situations like this: https://nitter.dark.fail/_akhaliq/status/1672267392280571905

GPT models can't tell the difference between truth and fiction. All you can choose by fine-tuning is their threshold for "admitting" mistakes.

startupsfail · on June 23, 2023

I do think it can tell the difference between truth and fiction. It’s a world model, it includes models of concepts like truth and fiction and can apply these concepts.

But that’s beyond the point. The question is, how would you include incorrect responses into the training. In a way that it would not increase the probability of the model to give an incorrect response?

I guess you can maybe train with a mix of correct and incorrect responses, hallucinations and nonsense in the conversation, but then make clear that the responses were incorrect, adding context to these. And then fine-tune the AI actor to avoid giving incorrect responses or hallucinations altogether.