Hacker News new | past | comments | ask | show | jobs | submit login

I found these resources to be helpful.

https://jalammar.github.io/illustrated-transformer/ This is a good illustration of the transformer and how the math works.

https://karpathy.ai/zero-to-hero.html If you want a deeper understanding of transform and how they fit in the whole picture of deep learning, this series is far and away the best resource I found. Karpathy goes into transformers by the sixth lecture, the previous lectures give a lot more context how deep learning works.




I agree that Karpathy's YouTube video is an excellent resource for understanding Transformers from scratch. It provides a hands-on experience that can be particularly helpful for those who want to implement the models themselves. Here's the link to the video titled "Let's build GPT: from scratch, in code, spelled out": https://youtu.be/kCc8FmEb1nY

Additionally, for more comprehensive resources on Transformers, you may find these resources useful:

* The Illustrated Transformer by Jay Alammar: http://jalammar.github.io/illustrated-transformer/

* MIT 6.S191: Recurrent Neural Networks, Transformers, and Attention: https://www.youtube.com/watch?v=ySEx_Bqxvvo

* Karpathy's course, Deep Learning and Generative Models (Lecture 6 covers Transformers): https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThs......

These resources cover different aspects of Transformers and can help you grasp the underlying concepts and mechanisms better.


I endorse all of this and will further endorse (probably as a follow-up once one has a basic grasp) "A Mathematical Framework for Transformer Circuits" which builds a lot of really useful ideas for understanding how and why transformers work and how to start getting a grasp on treating them as something other than magical black boxes.

https://transformer-circuits.pub/2021/framework/index.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: