Hacker News new | past | comments | ask | show | jobs | submit login

Just to add on, a good way to learn these terms is to look at the history of neural networks rather than looking at transformer architecture in a vacuum

This [1] post from 2021 goes over attention mechanisms as applied to RNN / LSTM networks. It's visual and goes into a bit more detail, and I've personally found RNN / LSTM networks easier to understand intuitively.

[1] https://medium.com/swlh/a-simple-overview-of-rnn-lstm-and-at...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: