Exactly right. These are statistical models that operate at a completely differe...

Exactly right. These are statistical models that operate at a completely different level of abstraction from the human brain: tokens/words/etc for text models, and individual (or perhaps small groups) of pixels for image models. Plausible output is obtained through sheer brute-force, and there is no mechanism to correct what we perceive as errors.