Absolutely, modern NN architectures have been inspired by biological ones- despi...

xiphias2 · on Dec 6, 2020

Actually self attention is a generalization of convolution:

https://openreview.net/pdf?id=HJlnC1rKPB

,,This work provides evidence that attention layers can perform convolution and, indeed, they often learn to do so in practice. Specifically, we prove that a multi-head self-attention layer with sufficient number of heads is at least as expressive as any convolutional layer. Our numerical experiments then show that self-attention layers attend to pixel-grid patterns similarly to CNN layers, corroborating our analysis.''

smallnamespace · on Dec 6, 2020

How do we really know that brains use hard attention?

Straw · on Dec 6, 2020

Eyes only have high resolution in a tiny spot.

smallnamespace · on Dec 7, 2020

Most definitely, what I meant though is how do we know mental attention is not essentially a big softmax (over some context, e.g. short or long term memory)?

The eye physically focusing on one thing at time seems like a special case (and this fact isn't even true of all animals, e.g. many prey species), and not a part of the brain's attention mechanism.

Straw · on Dec 7, 2020

Oh, I don't know, it very well might! However, the brain's physical structure doesn't look conducive to 4096x4096 matmuls!

l33tman · on Dec 7, 2020

Actually the retinal anisotropy is a feature distinct from attention. You can fixate on a point in a scene and then attend any other point in the scene. This is indeed how both animal and human attention experiments in the field is setup to control for the eyes movements.