As a simple example, if you ask a question and part of the answer is directly quoted from a book from memory, that text is not computed/reasoned by the AI and so doesn't have an "explanation".
But I also suspect that any AGI would necessarily produce answers it can't explain. That's called intuition.
It wouldn't be a reference; "explanation" for an LLM means it tells you which of its neurons were used to create the answer, ie what internal computations it did and which parts of the input it read. Their architecture isn't capable of referencing things.
What you'd get is an explanation saying "it quoted this verbatim", or possibly "the top neuron is used to output the word 'State' after the word 'Empire'".
Of course the AI could incorporate web search, but then what if the explanation is just "it did a web search and that was the first result"? It seems pretty difficult to recursively make every external tool also explainable…
Then you should have a stronger notion of "explanation". Why were these specific neurons activated?
Simplest example: OCR. A network identifying digits can often be explained as recognizing lines, curves, numbers of segments etc.. That is an explanation, not "computer says it looks like an 8"
But can humans do that? If you show someone a picture of a cat, can they "explain" why is it a cat and not a dog or a pumpkin?
And is that explanation the way how they obtained the "cat-nes" of the picture, or do they just see that it is a cat immediately and obviously and when you ask them for an explanation they come up with some explaining noises until you are satisfied?
Wild cat, house cat, lynx,...? Sure, they can. They will tell you about proportions, shape of the ears, size as compared to other objects in the picture etc.
For cat vs pumpkin they will think you are making fun of them, but it very much is explainable. Though now I am picturing a puzzle about finding orange cats in a picture of a pumpkin field.
Shown a picture of a cloud,
why it looks like a cat does sometimes need an explanation until others can see the cat, and it's not just "explaining noises".
When people talk about explainability I immediately think of Prolog.
A Prolog query is explainable precisely because, by construction, it itself is the explanation. And you can go step by step and understand how you got a particular result, inspecting each variable binding and predicate call site in the process.
Despite all the billions being thrown at modern ML, no one has managed to create a model that does something like what Prolog does with its simple recursive backtracking.
So the moral of the story is that you can 100% trust the result of a Prolog query, but you can't ever trust the output of an LLM. Given that, which technology would you rather use to build software on which lives depend on?
And which of the two methods is more "artificially intelligent"?
As a simple example, if you ask a question and part of the answer is directly quoted from a book from memory, that text is not computed/reasoned by the AI and so doesn't have an "explanation".
But I also suspect that any AGI would necessarily produce answers it can't explain. That's called intuition.