Its a good example of what LLMs arent capable of. If they think the word Mississippi has the letter A in it (per article), thats a strong indication that the transformer architecture may never be able to achieve AGI.
Here's another example of a simple question that a state of the art model (Claude 3.5) gets wrong, as tested just now:
Prompt: How many words are in this sentence?
Response: This sentence contains 7 words.
Interestingly, it seemed to count the number of words in my sentence correctly, but still answered incorrectly.
> Its a good example of what LLMs arent capable of. If they think the word Mississippi has the letter A in it (per article), thats a strong indication that the transformer architecture may never be able to achieve AGI.
It only indicates that tokenization is used. The LLM architecture can be used without tokenization. It would just mean that the available space in the context window would be used less efficiently. For example, a 10 letter word which could be represented by one token would instead take 10 slots.
Tokenization reorganizes information but doesn't remove it. It may be easier/harder to learn stuff like letter counting with different tokenization schemes, but the main reason it's hard is that there's not much text about letter counting in the training set. Ie, you could easily train any of the ChatGPT models to count letters in words by generating a bunch of training samples explicitly for this task, but it's not worth the bother.
Here's another example of a simple question that a state of the art model (Claude 3.5) gets wrong, as tested just now:
Prompt: How many words are in this sentence?
Response: This sentence contains 7 words.
Interestingly, it seemed to count the number of words in my sentence correctly, but still answered incorrectly.