Transformer networks have deeper connections to dense associative memory. For ex... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

macrolocal on Sept 12, 2022 | parent | context | favorite | on: Transformers seem to mimic parts of the brain

Transformer networks have deeper connections to dense associative memory. For example, the update rule to minimize the energy functional of these Hopfield networks converges in a single iteration and coincides with the attention mechanism [1].

[1] https://arxiv.org/abs/1702.01929

macrolocal on Sept 14, 2022 [–]

More accessible references:

https://mcbal.github.io/post/an-energy-based-perspective-on-... (Modern continuous Hopfield networks section)

https://arxiv.org/abs/2008.02217

Note that the connection to Hebbian learning hinges on the softmax function, in particular its exponential!

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact