Attention and Memory in Deep Learning and NLP

MrQuincle · on Jan 3, 2016

Early work on attention models is done by Itti, Koch, and Niebur. [1,2]. It's called "saliency" and I think Denny would consider this more along the lines of what the concept of "attention" should be (considering his own words/reservations in using the term). Koch is currently studying neural correlates, Itti is still working on this topic though. Niebur is into the neuroscience part of it (nematode expert).

There is a lot of neuroscientific work on attention, really a lot! Overt and covert attention. Microsaccades, very small eye movements, with already a bunch of possible functional roles. Almost everything we know about the brains of little kids is by studying where they look at and where they pay attention to.

Structure-wise attention models can be quite simple. The structure that is often seen is a WTA (winner-take-all) network with subsequent serial inhibition. The first winner is inhibited, so the next winner can come on stage. This is the same system as Baars has in his global workspace theory [3]. It is also the same method as in mundane RANSAC models [4]. That's a workhorse of computer vision in which a consensus/voting model can be used to have data points voting for higher-level structures. When one structure is detected, votes for it are removed, and the next most salient structure can be voted for.

[1] http://ilab.usc.edu/bu/

[2] http://cns-alumni.bu.edu/~yazdan/pdf/Itti_etal98pami.pdf

[3] https://en.wikipedia.org/wiki/Global_Workspace_Theory

[4] https://en.wikipedia.org/wiki/RANSAC

andreyk · on Jan 4, 2016

"I consider the approach of reversing a sentence a “hack”. It makes things work better in practice, but it’s not a principled solution."

I had the same feeling about boundary box recommendations/guesses that were used to speed up object recognition with Deep Learning fairly recently. Just as with a sliding box approach it is intuitive and works, but it also seems quite inelegant and like a better approach should be possible. Visual attention seems like it should work much better in the long term, so it is exciting the field has come to a point where it has been developed.

zappo2938 · on Jan 4, 2016

I'm curious which people on Hacker New are interested in this field?

har777 · on Jan 4, 2016

Why wouldn't they be excited ? It's an exciting field. I'm sure plenty of people are learning about ml in their free time.

zappo2938 · on Jan 4, 2016

This post had 75 upvotes and only 1 comment. Not a lot of bikeshedding -- that suggests many people are excited about this field and very few are actively involved. I was curious if there are people here working with deep learning and nlp either doing research professionally or in their spare time. It means to me that there might be a lot of opportunity in this field.

har777 · on Jan 4, 2016

I agree. People are really excited but it takes some knowledge to make a intelligent comment to this. I for one read a lot of NIPS paper's but don't understand most of them enough.

dennybritz · on Jan 4, 2016

Anther reason could also be the nature of the post. It's more of a summary, and not controversial topic that would lead to heated discussions.