Its a good example of what LLMs *arent* capable of. If they think the word Missi...

wcoenen · 2024-09-17T11:29:13 1726572553

> Its a good example of what LLMs arent capable of. If they think the word Mississippi has the letter A in it (per article), thats a strong indication that the transformer architecture may never be able to achieve AGI.

It only indicates that tokenization is used. The LLM architecture can be used without tokenization. It would just mean that the available space in the context window would be used less efficiently. For example, a 10 letter word which could be represented by one token would instead take 10 slots.

psb217 · 2024-09-17T13:41:05 1726580465

Tokenization reorganizes information but doesn't remove it. It may be easier/harder to learn stuff like letter counting with different tokenization schemes, but the main reason it's hard is that there's not much text about letter counting in the training set. Ie, you could easily train any of the ChatGPT models to count letters in words by generating a bunch of training samples explicitly for this task, but it's not worth the bother.