This particular video is an exceptionally awesome video.
And so are his other neural net videos.
He keeps bringing you back to first principles, again and again, does not let a single doubt remain, and the code is very accessible and practical.
The intersection between those who can teach and those who truly understand the deepest essence of a subject is much smaller than we would hope. People like Feynman, Tanenbaum, Sussman, Susskind, and now Karpathy are exceedingly rare. Each of them is a gift for generations to come. So, when you find one whose style resonates with the way you think, I suggest watching their videos multiple times. :)
Thats cool but not exactly the same as reading text that was written to be consumed as text. I think a big part of the reason technical text is easier to digest is the way it's written, and not so much the medium.
Because the value isn't in just having raw reference material like complex code you just paste into your program. The value is in communicating ideas and understanding.
You don't need to skim through the 4 hour video long to see it's full of whys and hows and explanations and demonstrations that massively dwarf the blog post. It's basically a mini course.
Programming is an interactive process, so transcribing the video to text also removes a lot information. The video isn't just an audio form of what could've been a long text tutorial. You're watching someone do something.
I did a similar project a couple of years ago for a university course, only I also added style transfer, it turned out pretty cool. I scraped a bunch of news data together with it's news section and trained a self attention language model from scratch, turned out pretty hilarious. Data was in Hebrew, which is a challenge to tokenize because of the morphology. I posted it on ArXiV if someone's interested in the style transfer and tokenization process: https://arxiv.org/abs/2212.03019
That's cool as a learning experience, but if you're gonna build a language transformer, why not instead of ClosedAI's outdated nonsense learn something with a more established open architecture like llama, so whatever you end up training ends up plug and play compatible with every LLM tool in the universe when converted to a GGUF?
Otherwise it's like learning to build a website and stopping short of actually doing the final bit where you put it on a webserver and run it live.
https://youtu.be/l8pRSuU81PU