The language switch issue could be solved in a whole host of different ways without disposing of the transformer architecture. The deeper issue, can transformers "understand" truth and reason, is more nuanced. A transformer architecture built around (token, accuracy) tuples instead of raw tokens and provided "fact checked" training data could be instructed to make statements that are likely to be true when possible and provide a disclaimer when making statements that may not be true. The reasoning is a little harder to address, but I suspect that if the language transference issue was solved, training a model on a wide variety of synthetic algebra problems (automated proof software could spit out an endless stream) and other deductive chain-of-thought examples would do a lot to help transformers emulate logical processes.
I think it greatly depends on what you mean by solving the language switch issue: as a practical matter, you could make English the "base" language and automatically translate queries and responses to/from English. This would likely work if you are only concerned with making GPT-4 a better commercial product for non-English users.
But there's a much deeper issue at play here. When I read an English-language fact about cats, I don't update my understanding of the word "cat." I update my understanding of the concept of cats. It's this abstract conception of cats that GPT-4 is missing. During training, if GPT reads a Swahili sentence about "paka" describing a true fact about cats, it needs to automatically update the weights around the word "cat" with the English language version of that sentence. Otherwise the understanding will be superficial and brittle. Stuff like this seems necessary (but not sufficient) for GPT to truly understand that the tokens have actual semantic meaning.
> A transformer architecture built around (token, accuracy) tuples
Maybe I don't understand your idea, but I don't think this is meaningful. If I am wearing a red shirt and I say "This shirt is not red," what are the accuracies of the individual words? If the accuracy of "red" is 100% then the accuracy of "not" is 0%, and vice versa. Maybe you say they're both 50% accurate? It's not a coherent definition.
Once GPT-4+ has also sucked in all frames of (curated) video and audio, I imagine it's 'concept' of a cat will be quite a bit better.
While that might not directly impact your (wonderful!) example, I tend to assume it'll still manage to do quite a bit better. Maybe it'll make the additional associations between cat pictures and swahili subtitles/narration, making it more likely to do at least a better translation?
Simple. Token sequences are "fact checked" and the training data is annotated to update the accuracy value of the tokens in context. Then the sequences "Donald Trump is the new Jesus" and "Donald Trump is the new Hitler" would have different accuracy scores (probably represented as mean/deviation to encapsulate uncertainty/divisiveness).
When I say solving the language switch issue, I mean something akin to adding a translation layer to transformers, so you're learning a translation to a meta-language and meta-language token transition probabilities simultaneously.
How can you be sure that RLHF doesn't induce a more global/translatable kind of discrimination in favor of higher quality information? This should in theory be possible in the same way highly adept humans discriminate between sources.