Hacker News new | past | comments | ask | show | jobs | submit login
Build and train GPT-2 from scratch using PyTorch (differ.blog)
138 points by thunderbong 6 months ago | hide | past | favorite | 17 comments



Andrej Karpathy's video is probably much better than this:

https://youtu.be/l8pRSuU81PU


It's gotta be an exceptionally awesome video to beat written text though. I'll take mediocre text over good video any day.


This particular video is an exceptionally awesome video.

And so are his other neural net videos.

He keeps bringing you back to first principles, again and again, does not let a single doubt remain, and the code is very accessible and practical.

The intersection between those who can teach and those who truly understand the deepest essence of a subject is much smaller than we would hope. People like Feynman, Tanenbaum, Sussman, Susskind, and now Karpathy are exceedingly rare. Each of them is a gift for generations to come. So, when you find one whose style resonates with the way you think, I suggest watching their videos multiple times. :)


Convert video to text with this one simply secret trick:

* yt-dlp

* whisper

(whisper is surprisingly good for a lot of educational videos)

Even better is to connect text to images. but that's less of a simple trick (although I do have the code).


Thats cool but not exactly the same as reading text that was written to be consumed as text. I think a big part of the reason technical text is easier to digest is the way it's written, and not so much the medium.


How can a video be better than text based info when we're talking about a programming task, which is all text?


Because the value isn't in just having raw reference material like complex code you just paste into your program. The value is in communicating ideas and understanding.

You don't need to skim through the 4 hour video long to see it's full of whys and hows and explanations and demonstrations that massively dwarf the blog post. It's basically a mini course.

Programming is an interactive process, so transcribing the video to text also removes a lot information. The video isn't just an audio form of what could've been a long text tutorial. You're watching someone do something.


> The value is in communicating ideas and understanding.

And those ideas absolutely can't be communicated in concise writing, I have to waste my life on listening to talking heads?


This is an option for those who want to learn from a more interactive medium instead of from a textbook.

Different people learn in different ways. I wouldn’t call any of it a waste.


Yes. Skim the video. You're watching someone explain things interactively in a way that cannot be done with text.

Whether you have the preference for it or not is not up for debate here.


> Skim the video.

So not all 4 hours are worth watching? :)


You can listen someone thought process, that is extremely valuable


Also check out Andrej's new llm.c library which includes a script to do this from scratch with fineweb.


Cool blog, thanks!

I did a similar project a couple of years ago for a university course, only I also added style transfer, it turned out pretty cool. I scraped a bunch of news data together with it's news section and trained a self attention language model from scratch, turned out pretty hilarious. Data was in Hebrew, which is a challenge to tokenize because of the morphology. I posted it on ArXiV if someone's interested in the style transfer and tokenization process: https://arxiv.org/abs/2212.03019


That's cool as a learning experience, but if you're gonna build a language transformer, why not instead of ClosedAI's outdated nonsense learn something with a more established open architecture like llama, so whatever you end up training ends up plug and play compatible with every LLM tool in the universe when converted to a GGUF?

Otherwise it's like learning to build a website and stopping short of actually doing the final bit where you put it on a webserver and run it live.


The link to the repo looks broken

https://github.com/ajeetkharel/gpt2-from-scratch/


Reminds me of TinyStories. I wonder if this architecture is better or worse than the ones it tested.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: