I think it greatly depends on what you mean by solving the language switch issue...

mercer · on Jan 2, 2024

Once GPT-4+ has also sucked in all frames of (curated) video and audio, I imagine it's 'concept' of a cat will be quite a bit better.

While that might not directly impact your (wonderful!) example, I tend to assume it'll still manage to do quite a bit better. Maybe it'll make the additional associations between cat pictures and swahili subtitles/narration, making it more likely to do at least a better translation?

Or have I drank too much of the kool-aid already?

CuriouslyC · on Jan 1, 2024

Simple. Token sequences are "fact checked" and the training data is annotated to update the accuracy value of the tokens in context. Then the sequences "Donald Trump is the new Jesus" and "Donald Trump is the new Hitler" would have different accuracy scores (probably represented as mean/deviation to encapsulate uncertainty/divisiveness).

When I say solving the language switch issue, I mean something akin to adding a translation layer to transformers, so you're learning a translation to a meta-language and meta-language token transition probabilities simultaneously.