Sounds like you are thinking of language models in isolation, working in closed-...

Sounds like you are thinking of language models in isolation, working in closed-book mode. That is just the default, it doesn't need to be how they are used in practice.

Do you know language models can use external toys, such as a calculator. They just need to write <calc>23+34=</calc> and they get the result "57" automatically added. The same, they can run <search>keyword</search> and get up to date snippets of information. They could write <work>def is_prime(x): ... print(is_prime(57))</work> and get the exact answer.

I think the correlation pattern in language is enough to do real work, especially when fortified with external resources. Intelligence is most likely a property of language, culture and tools, not of humans and neural networks.