I don't understand what theoretical basis can even exist for "I don't know" from an LLM, just based on how they work.
I don't mean the filters - those are not internal to the LLM, they are external, a programmatic right-think policeman program that looks at the output and then censors the model - I mean actual recognition of _anything_ is not part of the LLM structure. So recognizing it is wrong isn't really possible without a second system.
> I don't understand what theoretical basis can even exist for "I don't know" from an LLM, just based on how they work.
Neither do I. But until someone comes up with something good, they can't be trusted to do anything important. This is the elephant in the room of the current AI industry.
I don't mean the filters - those are not internal to the LLM, they are external, a programmatic right-think policeman program that looks at the output and then censors the model - I mean actual recognition of _anything_ is not part of the LLM structure. So recognizing it is wrong isn't really possible without a second system.