I found these resources to be helpful. https://jalammar.github.io/illustrated-tr...

pankajdoharey · on April 15, 2023

I agree that Karpathy's YouTube video is an excellent resource for understanding Transformers from scratch. It provides a hands-on experience that can be particularly helpful for those who want to implement the models themselves. Here's the link to the video titled "Let's build GPT: from scratch, in code, spelled out": https://youtu.be/kCc8FmEb1nY

Additionally, for more comprehensive resources on Transformers, you may find these resources useful:

* The Illustrated Transformer by Jay Alammar: http://jalammar.github.io/illustrated-transformer/

* MIT 6.S191: Recurrent Neural Networks, Transformers, and Attention: https://www.youtube.com/watch?v=ySEx_Bqxvvo

* Karpathy's course, Deep Learning and Generative Models (Lecture 6 covers Transformers): https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThs......

These resources cover different aspects of Transformers and can help you grasp the underlying concepts and mechanisms better.

jaidhyani · on April 16, 2023

I endorse all of this and will further endorse (probably as a follow-up once one has a basic grasp) "A Mathematical Framework for Transformer Circuits" which builds a lot of really useful ideas for understanding how and why transformers work and how to start getting a grasp on treating them as something other than magical black boxes.

https://transformer-circuits.pub/2021/framework/index.html