I don't see how this shows that models don't understand the concept of length. As you say, it's a vision test, and the author describes how he had to adversarially construct it to "move slightly outside the training patterns" before LLMs failed. Doesn't it just show that LLMs are more susceptible to optical illusions than humans? (Not terribly surprising that a language model would have subpar vision.)
But it is not an illusion, and the answers make no sense. In some cases the models pick exactly the opposite answer. No human would do this.
Yes, outside the training patterns is the point. I have no doubt if you trained LLMs on this type of pattern with millions of examples it could get the answers reliably.
The whole point is that humans do not need data training. They understand such concepts from one example.