Hacker News new | past | comments | ask | show | jobs | submit login

Absolutely, modern NN architectures have been inspired by biological ones- despite their massive differences.

Even in cases like attention, the modern version (that actually works in GPT-3, AlphaFold2, etc), has little in common with both the english word and what we think of as attention. Its a formula with two matmuls and a softmax: softmax(AB)C. In particular, it doesn't necessarily look anywhere at all- just a weighted sum of the inputs. Nothing like the hard attention used by the human visual cortex. Its not even that different from a convolution where you allow the weights to be a function of the input.

So the inspiration might have come from humans, but the actual architectures have largely come from pure trial and error, with limited, difficult to explain intuition on what tends to work.




Actually self attention is a generalization of convolution:

https://openreview.net/pdf?id=HJlnC1rKPB

,,This work provides evidence that attention layers can perform convolution and, indeed, they often learn to do so in practice. Specifically, we prove that a multi-head self-attention layer with sufficient number of heads is at least as expressive as any convolutional layer. Our numerical experiments then show that self-attention layers attend to pixel-grid patterns similarly to CNN layers, corroborating our analysis.''


How do we really know that brains use hard attention?


Eyes only have high resolution in a tiny spot.


Most definitely, what I meant though is how do we know mental attention is not essentially a big softmax (over some context, e.g. short or long term memory)?

The eye physically focusing on one thing at time seems like a special case (and this fact isn't even true of all animals, e.g. many prey species), and not a part of the brain's attention mechanism.


Oh, I don't know, it very well might! However, the brain's physical structure doesn't look conducive to 4096x4096 matmuls!


Actually the retinal anisotropy is a feature distinct from attention. You can fixate on a point in a scene and then attend any other point in the scene. This is indeed how both animal and human attention experiments in the field is setup to control for the eyes movements.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: