Build and train GPT-2 from scratch using PyTorch

rty32 · 2024-07-06T05:19:12 1720243152

Andrej Karpathy's video is probably much better than this:

elpocko · 2024-07-06T09:47:30 1720259250

It's gotta be an exceptionally awesome video to beat written text though. I'll take mediocre text over good video any day.

throwaway71271 · 2024-07-06T10:58:55 1720263535

This particular video is an exceptionally awesome video.

And so are his other neural net videos.

He keeps bringing you back to first principles, again and again, does not let a single doubt remain, and the code is very accessible and practical.

The intersection between those who can teach and those who truly understand the deepest essence of a subject is much smaller than we would hope. People like Feynman, Tanenbaum, Sussman, Susskind, and now Karpathy are exceedingly rare. Each of them is a gift for generations to come. So, when you find one whose style resonates with the way you think, I suggest watching their videos multiple times. :)

frognumber · 2024-07-06T10:50:27 1720263027

Convert video to text with this one simply secret trick:

* yt-dlp

* whisper

(whisper is surprisingly good for a lot of educational videos)

Even better is to connect text to images. but that's less of a simple trick (although I do have the code).

omerhac · 2024-07-06T11:18:18 1720264698

Thats cool but not exactly the same as reading text that was written to be consumed as text. I think a big part of the reason technical text is easier to digest is the way it's written, and not so much the medium.

nottorp · 2024-07-06T10:05:23 1720260323

How can a video be better than text based info when we're talking about a programming task, which is all text?

hombre_fatal · 2024-07-06T10:45:37 1720262737

Because the value isn't in just having raw reference material like complex code you just paste into your program. The value is in communicating ideas and understanding.

You don't need to skim through the 4 hour video long to see it's full of whys and hows and explanations and demonstrations that massively dwarf the blog post. It's basically a mini course.

Programming is an interactive process, so transcribing the video to text also removes a lot information. The video isn't just an audio form of what could've been a long text tutorial. You're watching someone do something.

nottorp · 2024-07-06T11:05:49 1720263949

> The value is in communicating ideas and understanding.

And those ideas absolutely can't be communicated in concise writing, I have to waste my life on listening to talking heads?

jamescmartinez · 2024-07-06T14:02:47 1720274567

This is an option for those who want to learn from a more interactive medium instead of from a textbook.

Different people learn in different ways. I wouldn’t call any of it a waste.

hombre_fatal · 2024-07-06T11:10:19 1720264219

Yes. Skim the video. You're watching someone explain things interactively in a way that cannot be done with text.

Whether you have the preference for it or not is not up for debate here.

nottorp · 2024-07-06T11:30:42 1720265442

> Skim the video.

So not all 4 hours are worth watching? :)

graovic · 2024-07-06T10:08:54 1720260534

You can listen someone thought process, that is extremely valuable

cjtrowbridge · 2024-07-06T06:47:49 1720248469

Also check out Andrej's new llm.c library which includes a script to do this from scratch with fineweb.

omerhac · 2024-07-06T11:28:05 1720265285

Cool blog, thanks!

I did a similar project a couple of years ago for a university course, only I also added style transfer, it turned out pretty cool. I scraped a bunch of news data together with it's news section and trained a self attention language model from scratch, turned out pretty hilarious. Data was in Hebrew, which is a challenge to tokenize because of the morphology. I posted it on ArXiV if someone's interested in the style transfer and tokenization process: https://arxiv.org/abs/2212.03019

moffkalast · 2024-07-06T11:34:09 1720265649

That's cool as a learning experience, but if you're gonna build a language transformer, why not instead of ClosedAI's outdated nonsense learn something with a more established open architecture like llama, so whatever you end up training ends up plug and play compatible with every LLM tool in the universe when converted to a GGUF?

Otherwise it's like learning to build a website and stopping short of actually doing the final bit where you put it on a webserver and run it live.

aziis98 · 2024-07-06T08:31:54 1720254714

The link to the repo looks broken

https://github.com/ajeetkharel/gpt2-from-scratch/

KTibow · 2024-07-06T06:32:23 1720247543

Reminds me of TinyStories. I wonder if this architecture is better or worse than the ones it tested.