We can't help but build real models of what we see - our retina/optic nerve are already doing this before our brain even receives the 'image'!
I can't help but believe some of the image recognition mentioned in your article, especially of icons, is built through previous experience with similar iconic images. Symbols for things become associated with the real things. Its a modern adaptation of a much older processing mechanism.
OK... but how is that pattern-matching different from what the computer is doing? Why is human pattern-matching "understanding" and computer patter-matching is not?
Its the 2nd state of cognitive engagement that makes humans different. Of course a field of static isn't a panda. The computer has no capacity to recognize the context.
I think I get your point now. It's OK if a human momentarily mistakes a random blob for a panda, but they should be able to figure out from other visual cues and context that it's not a panda. And it's that second part that's missing from the computer models?