A lot of people cite these numbers as cynics or to dissuade others' optimism, but if you actually work in the field and know what it was like just 5 years ago, these numbers should be extremely worrying to everyone who's afraid of automation. Even that linked paper from 2021 is already outdated. We don't need another revolution, we just need maybe a dozen small to medium insights. The steps from GPT3 to 3.5 alone were pretty straightforward and yet they already created a small revolution in LLM usefulness. Model scale used to slow down the pace of research for a moment, but with so many big companies jumping on the train now, you can expect research to accelerate again.
The training data contains tons of false information and the training objective is simply to reproduce that information. It's not at all surprising that these models fail to distinguish truth from falsehood, and no incremental change will change that. The problem is paradigmatic. And calling people cynics for pointing out the obvious and serious shortcomings of these models is poor form IMO.
The large corpus of text is only necessary to grasp the structure and nuance of language itself. Answering questions 1. in a friendly manner and 2. truthfully is a matter of fine-tuning as the latest developments around GPT3.5 clearly show. And with approaches like indexGPT the usage of external knowledge bases that can even be corrected later is already a thing, we just need this at scale and with the correct fine tuning. The tech is way further than those cynics realize.
I'm sure you can add constraints of some sorts to build internally consistent world models. Or add stochastic outputs as has been done in computer vision to assign e.g. variances to the probabilities and determine when the model is out if its depth (and automatically query external databases to remove the uncertainty / read up on the topic..)
If you actually follow the literature, you'll find that there is tons of evidence that the seemingly "simple" transformer architecture might actually work pretty similar to the way the human brain is believed to work.