Hacker News new | past | comments | ask | show | jobs | submit login

Transformer networks have deeper connections to dense associative memory. For example, the update rule to minimize the energy functional of these Hopfield networks converges in a single iteration and coincides with the attention mechanism [1].

[1] https://arxiv.org/abs/1702.01929




More accessible references:

https://mcbal.github.io/post/an-energy-based-perspective-on-... (Modern continuous Hopfield networks section)

https://arxiv.org/abs/2008.02217

Note that the connection to Hebbian learning hinges on the softmax function, in particular its exponential!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: