I'm a huge fan of Mumford, but I think he's stretching a bit when he says that natural images have grammar like language has grammar. The production processes for the two phenomena are enormously different. A natural image is formed when a collection of objects is illuminated by incoming light, and the resulting image is projected onto the retina. The human brain is not involved at all in this process (leaving aside nitpicking about how humans may have shaped the environment). In contrast a natural language sentence is produced when an idea occurs inside the brain, and then various linguistic production processes transform the idea into a serial form, as text, speech, sign language, etc. The latter process involves constraints, capabilities, and eccentricities of the human brain at every stage.
Maybe you could argue that human brains perceive images using grammar-like structures.
>> I'm a huge fan of Mumford, but I think he's stretching a bit when he says that natural images have grammar like language has grammar.
That seems to be what he's saying- that the process of vision in living things is actually a grammar.
He doesn't have to be entirely correct for his intuition to be that useful though. We don't have to assume an existing grammar to fit a bunch of data to a grammar (like we do in grammar induction, where there's no such assumption when, say, someone models DNA sequences as a grammar etc).
After all, a grammar is just a representation. The question is how good that representation is- in theory as well as in practice. In theory, it's a good representation if it helps us answer questions about the process we're trying to model. In practice it's good if it allows us to reproduce the process, especially automatically, with computers, and predict the behaviour of agents that employ this process etc etc.
Human brains also perceive speech or writing or what have you using grammar-like structures. In the sense brought up by the article, a grammar is just a structured logical representation of some physical phenomenon, like sound or imagery.
In other words, grammar is how we perceive things, and it makes sense that it can be generalized instead of only being applicable to a specific sensory input.
The specific point about parsing visual input is much more obvious when looking at things like graphical charts, user interfaces, etc.; there's clearly a grammar of some form involved in, say, determining which button to press on my phone's on-screen keyboard to create the letter 'b', and further involved when determining what a "button" is or what a "keyboard" is or what a "letter" is. Hell, that seems like the same process that lets me figure out what a "screen" is and what a "phone" is. Eventually we go from "type the letter 'b'" to "move your right thumbtip to this position" (and even that can be broken down further).
We receive speech coming from a source and parse it using a grammar. One could imagine it being a similar process for the perceiving images captured by the retina.
For output, when a human paints an image they are panting from an imaged visualized inside of their mental canvas, just like we realize thoughts produced within our minds as speech.
> when a human paints an image they are panting from an imaged visualized inside of their mental canvas
Yes, but images produced by humans are a tiny fraction of images processed by the eye. But every written or spoken sentence was ultimately created by a human brain.
That's why it seems like a big stretch to claim there is a 'universal grammar' involved in visual processing, if you believe that grammar is primarily a way for brains to encode information for communication purposes...
> Yes, but images produced by humans are a tiny fraction of images processed by the eye.
Processed by the eye, yes, but that rises to 100% for images processed by the brain. The brain appropriates images by imparting its processing on the lower level visual cortex. Perception is an active process.
This is true. I did not mean to say that just because I think it is wrong, that it is. However, the claim seemed to be that the images experienced by the brain are fully synthesized by the brain. Which seemed off.
Again, just because it seems off to be does not mean it is wrong. Not my field, and whatnot. I can even see something to be said for visual processing going in stages such that the stage that you are cognizant of is effectively on images constructed by you. That seems to be a different claim, though.
The claim was "Processed by the eye, yes, but that rises to 100% for images processed by the brain." That is, that the images processed by the brain were 100% constructed by the brain.
The implication I got was that the images you perceive are entirely of your own devising. This seems off to me. Certainly anyone that is blind but still able to visualize a room is using constructed visualizations. But, that is a different thing than someone that is able to see.
This is different from written words. Which are 100% devised by another being. Maybe assembled by a machine, but the words and the meanings of them are learned and come from taught meanings. Not from raw processed experiences.
I'm not entirely following what you mean, but that's OK. My hunch is our differences lie this concept of "taught meaning". I don't think meanings are taught, in any traditional sense. I think they are absorbed, acquired, and synthesized by the incredible pattern matching of the brain, operating off of direct, perceptual experience. Of course, these experiences includes things like reflection, reading a textbook, having a conversation, watching a movie, daydreaming etc.
When one reads a piece of text, it's being interpreted through the complex mental models of the world and layers of meaning that have been built up in the individual's brain over the years.
>A natural image is formed when a collection of objects is illuminated by incoming light, and the resulting image is projected onto the retina. The human brain is not involved at all in this process
No, the retina is a complex processor, and so is the optic nerve. Brain scientists nowadays say the retina is an extension of the brain.
> A natural image is formed when a collection of objects is illuminated by incoming light, and the resulting image is projected onto the retina.
But that's sort of the point no? The incoming information is not an unstructured white noise of photons striking our retina. There is a sort of structure to the information that can be modelled. One such model of this structured information is as a "grammar tree" (really just a tree, we're coders here.) The example in the article is that the arm occludes the teepee, which occludes the background trees. Any visual system needs to break this hierarchy down.
Recursion was recently shown to enable generalization in neural programming architectures [1] but from a critical inquiry p.o.v. we note that recursion requires a mechanism for maintaining context and look for the existence (or lack of) such mechanism in animal brains.
Maybe you could argue that human brains perceive images using grammar-like structures.