I don't disagree with the conclusion, I disagree with the reasoning.
There's no reason to assume that models trained to predict a plausible next sequence of tokens wouldn't eventually develop "understanding" if it was the most efficient way to predict them.
The evidence so far is a definite no. LLMs will happily produce plausible gibberish, and are often subtly or grossly wrong in ways that betray complete lack of understanding.
There's no reason to assume that models trained to predict a plausible next sequence of tokens wouldn't eventually develop "understanding" if it was the most efficient way to predict them.