Neural Networks: Zero to Hero

sirodoht · on April 5, 2023

I'm doing an ML apprenticeship [1] these weeks and Karpathy's videos are part of it. We've been deep down into them. I found them excellent. All concepts he illustrates are crystal clear in his mind (even though they are complicated concepts themselves) and that shows in his explanations.

Also, the way he builds up everything is magnificent. Starting from basic python classes, to derivatives and gradient descent, to micrograd [2] and then from a bigram counting model [3] to makemore [4] and nanoGPT [5]

[1]: https://www.foundersandcoders.com/ml

[2]: https://github.com/karpathy/micrograd

[3]: https://github.com/karpathy/randomfun/blob/master/lectures/m...

[4]: https://github.com/karpathy/makemore

[5]: https://github.com/karpathy/nanoGPT

jimsparkman · on April 5, 2023

That program sounds quite impressive, I wonder if any equivalencies exist in the US?

sirodoht · on April 5, 2023

The website doesn't say what—for me—is the best thing about it. The course is peer-led which works like this: once your join, you're part of a team which has one objective: get the best score with your ML recommendation system.

There is simulated environment in which all teams of the cohort receive millions of requests per day (and hundreds of thousands of users and items) and you have to build out your infrastructure on an EC2 instance, build a basic model, and then iteratively improve on it. Imagine a simulated facebook/youtube/tiktok-style system where you aim for the best uptime and the best recommendations!

KyeRussell · on April 5, 2023

That is really cool and engaging.

robbie-c · on April 6, 2023

That apprenticeship looks interesting, could I ask you a few questions about it?

My email address is first part of my username (before the “-“) at blueheart dot io.

bilsbie · on April 5, 2023

Do you run the code as you watch?

I’ve been simply watching them on a palm from a hammock and I’m worried I’m not getting the full experience.

iKlsR · on April 6, 2023

Something I discovered not so long ago that I wish I had years ago is to watch the video first and then code along after. So simple but makes a world of difference, you can skip errors, fluff and foresee what's next, all around you'd think it's watching two 2 hour videos but it works out in terms of getting the most out of the content and drilling it into your head.

sirodoht · on April 5, 2023

I've found that actually running the code has been very beneficial in understanding. This, along with reasoning for each line of code and spending a lot of time with the video paused and discussing and explaining to each other what we understood.

whiplash451 · on April 5, 2023

Same. I also found the exercises to be useful.

psychphysic · on April 6, 2023

Running the code helps with understanding and developing practical skills.

Watching only is much nicer for entertainment.

7373737373 · on April 5, 2023

This was the first time I actually grokked backpropagation, just the first video alone is more lucid and valuable than any other resource about machine learning I had seen before, in fact it's so well explained that i managed to implement the library almost completely from memory after watching it - I cannot recommend it highly enough, especially for programmers without a math background!

The only aspect I could see being non-ideal for some is that it uses some Python-specific cleverness/advanced syntax and semantics (__call__(), list comprehensions with two for's, **kwargs, __add__, __repr__, subclasses, (nested) functions as variables etc.), but if you are familiar with these it might seem more compact and elegant as well.

mhh__ · on April 6, 2023

I find it genuinely stunning how ugly python gets in these projects. Almost immediately, even in a toy project (not tinygrad which is deliberately super dense).

whiplash451 · on April 5, 2023

To be fair, the older Andrew Ng’s online course was also fantastic to explain backprop.

But this does not remove any credit to Andrej’s class.

fuddle · on April 6, 2023

I agree. I'm looking forward to re-watching it to as it was so information dense.

Buttons840 · on April 5, 2023

My hive mind connection must be good because I literally finished this course yesterday.

It was very satisfying to learn how transformers worked, to finally be able to turn the obscure glyphs of the research papers into real code, but I think transformers are too big for what I can do on my own computer. The author mentioned that the toy transformer he was building in the final video took 15 minutes to train on his A100 GPU (a $10,000 GPU), and the results weren't even that good; the transformer was spelling words correctly using character level tokens, I guess that's something, but it's not GTP4.

Even so, there were a lot of good tips to pick up along the way. This is a great series that I'm thankful to have. The "Backprop Ninja" video was hard work, you manually calculate the gradients and then compare your calculations against PyTorch. It's great to have instant feedback telling you whether your gradients are correct or not.

tysam_and · on April 6, 2023

I made a smaller GPT model that started from Andrej's code that converges to a decent loss in a short amount of time on an A100 -- just under 2.5 minutes or so: https://github.com/tysam-code/hlb-gpt

With the original hyperparameters, it was 30-60 minutes, with a pruned down network and adjusted hyperparameters, about 6 minutes, and a variety of optimizations beyond that to bring it down.

If you want the nano-GPT basically feature-identical (but pruned down) version, 0.0.0 at ~6 minutes or so is your best bet.

You can get A100s cheaply and securely through Colab or LambdaLabs.

callistus · on April 6, 2023

> his A100 GPU (a $10,000 GPU)

These are available to rent per hour at much lower costs. The author mentions this in the video description.

Buttons840 · on April 6, 2023

True, as much as I enjoy owning and controlling my own hardware, buying an A100 and then letting it sit idle while I procrastinate and play video games probably isn't the best use of resources. He did say "my GPU" (or similar) at one point, and I thought maybe he does enough ML stuff that he bought his own.

nl · on April 6, 2023

If you have an NVIDIA gaming GPU you can train reasonable transformers.

oh_sigh · on April 6, 2023

Approximately 40 cents USD for 15 minutes from cursory research.

Buttons840 · on April 6, 2023

I'm completely unfamiliar with this market. Do you rent these on AWS? Or where?

sva_ · on April 6, 2023

https://jarvislabs.ai/pricing/

$1.29 per hour for a 40gb a100 apparently

https://lambdalabs.com/service/gpu-cloud#pricing

$1.10 per hour

ranguna · on April 6, 2023

If you're ok with 24GB you can use a 3090 https://www.genesiscloud.com/pricing for 0.70$/h

yacine_ · on April 5, 2023

What I appreciate about karpathy's videos is that it doesn't make things any more complicated than they need to be. Simple, engineering language is used. No gatekeeping! It's reassuring, and lets everyone know that anyone can do it.

Thanks karpathy!

bilsbie · on April 5, 2023

I just don’t know what he means by logits. Everything else seems like straightforward language.

Buttons840 · on April 5, 2023

He defines it pretty clearly. Logits are the inputs to a softmax layer / calculation, which turn the logits into normalized percentages (the percentages sum to 1.0).

Before going through the softmax layer, the logits will be small numbers around 0, probably. Something like: [2.89, -4.53, 0.24, -1.556, 0.57]. Logits like this are natural outputs of a neural network, because they can be any real number and everything will still work.

The logits become percentage as follows:

    julia> x = [2.89, -4.53, 0.24, -1.556, 0.57]
    5-element Vector{Float64}:
      2.89
     -4.53
      0.24
     -1.556
      0.57
    
    julia> x = e.^x
    5-element Vector{Float64}:
     17.993309601550315
      0.010780676072743085
      1.2712491503214047
      0.2109782988178321
      1.768267051433735
    
    julia> x / sum(x)
    5-element Vector{Float64}:
     0.8465613320288766
     0.0005072164987105474
     0.05981058503789324
     0.009926248902037537
     0.08319461753248213

Logits is an overloaded term though, and means different things in different contexts.

joshvm · on April 5, 2023

When people mention logits, they're usually referring to the raw output of the model before it gets transformed/normalised into a probability distribution (i.e. sums to 1, range [0,1]). Logits can take any value. The naming might not be mathematically strict, because it assumes(?) that you're going to apply softmax (which interprets the output of the model as logits), but that's how the term is used.

For example in many classification problems you get a 1D vector of logits from the final layer, you apply softmax to normalise, then argmax to extract the predicted class. It extends to other tasks like semantic segmentation (predict pixel classes) where the "logit" output is the same size as the image with a channel for each class and you apply the same process to get a single channel image with class-per-pixel.

Here's a nice explanation: https://stackoverflow.com/a/66804099/395457

KyeRussell · on April 5, 2023

Honestly what cracked logits for me was a conversation with ChatGPT in which I gave it my professional background, areas of strength and weakness, and problem context, and had it explain to me. I then went elsewhere to make sure I hadn’t been lied to. I’ve found ChatGPT such an invaluable learning tool when used in this way.

PheonixPharts · on April 6, 2023

I'm confused by the comments here, does "logit" here not mean "log odds" like it does in virtually every other context related to machine learning?

Generally I'm a huge fan of not getting too caught up in theory before diving into practice, but I'm seeing multiple responses to this comment without a single mention of "log odds".

The logit function transforms probabilities into the log of the odds (ln P(X)/(1-P(X)), which is important because it makes probabilities linear, which they are not in their standard [0,1] form. It's the foundation of logistic regression, which is, despite much misinformation, quite literally linear regression with a transformed target.

The logistic function is the inverse of the logit: it turns log odds values into probabilities once again. Logistic regression actually transforms the model not the target (most of the time) because the labels 0 and 1 are negative and positive infininity which can't be handled by linear regression (so we transform the model using the inverse instead).

I don't think I can stress enough how important is its to really understand logistic regression (which is also the basic perceptron) before diving into neural networks (which are really just an extension of logistic regression).

airstrike · on April 5, 2023

Having not watched the series, I can only assume he means logit as in a probability function from 0 to 1

https://deepai.org/machine-learning-glossary-and-terms/logit....

PheonixPharts · on April 6, 2023

logit is not a "probability function", quite the opposite. You can see this in the image in the link you posted (the x-axis is from 0-1, the y-axis is from -inf to inf). It transforms probabilities into log odds which is a linear space, and make combining probabilities much nicer.

The inverse logit or logistic function takes log odds and transforms them back into probabilities.

Most machine learning relies heavily on manipulating probabilities, but since probabilities are not linear, the logit/logistic transformations become essential to correctly modeling complex problems involving probabilities.

airstrike · on April 6, 2023

probability-related* function

auggierose · on April 5, 2023

This course together with the new fastai ones [1] seem to be exactly what I was looking for. The micrograd video is excellent.

[1] https://course.fast.ai

throwbackhere · on April 6, 2023

I tried this course a few weeks ago but quickly got stuck after finetuning the first model that detects cats (first example). The finetuning part works but I was never able to get it to infer (python kept crashing complaining that an object didn't have a read function).

Hopefully, I'll manage to get further with this course.

ethicalsmacker · on April 6, 2023

The simplest guide to NNs I have ever read is this one: http://www.ai-junkie.com/ann/evolved/nnt1.html

It's an old site and guide, but probably still the easiest to understand if you're coming from a programming background.

agentofoblivion · on April 5, 2023

Wonderful! Just went through the GPT video the other day and it was great. Andrej has a talent for pedagogy via simplification.

yuuuuyu · on April 5, 2023

And he presents all this stuff with humility. Many people that present are just showing off and are pretty much full of themselves. I suppose they need the ego boost, who knows. But Andrej could be the nice guy next door in the dorm who is studying the same course as you, just that hr is a lecture or two ahead. (Until you figure out he is the former VP of AI at Tesla or whatever his title ended up being before he left.)

I can even recommend his interview with Lex Fridman.

0cf8612b2e1e · on April 5, 2023

Only finished the first video, but he even made two minor blunders in his code, but kept the footage. Really helps your confidence when you see a pro make a mistake rather than a perfectly polished but unattainable ideal standard.

Yajirobe · on April 5, 2023

He also is quite good at teaching and solving the Rubik's cube

meling · on April 5, 2023

Absolutely agree with this.

abraxas · on April 5, 2023

He is a master educator. While at Stanford he developed their undergrad machine learning intro course named cs231n which immediately became legendary. It's somewhat out of date on some details but it's still well worth watching especially as delivered by Andrej. You can find all 11 lectures on YouTube.

polycaster · on April 6, 2023

https://youtube.com/playlist?list=PLkt2uSq6rBVctENoVBg1TpCC7...

frankcort · on April 5, 2023

Which GPT Video?

nomel · on April 5, 2023

It can be found on the sites home page.

Let's build GPT: from scratch, in code, spelled out: https://www.youtube.com/watch?v=kCc8FmEb1nY

frankcort · on April 5, 2023

Thank you!

sourcecodeplz · on April 5, 2023

This is really cool and I am so glad my math teacher was a hard ball and I still remember some Calculus.

edit: Python really was/is made for this numbers/calculation/visualization thing. Kinda kicking myself now for not investing more in it and sticking with PHP, although PHP has its merits when building different things, Python is a beast with numbers.

sourcecodeplz · on April 5, 2023

I am at graphwiz now and it is getting better and better.

Also using ChatGTP to ask questions where I don't get something.

Wow what a time we live in to learn things.

Buttons840 · on April 5, 2023

Those interested in this might also be interested in some notes from the University of Amsterdam:

https://uvadlc-notebooks.readthedocs.io/en/latest/index.html

Tutorial 6 covers transformers.

mickael-kerjean · on April 6, 2023

Discovering this kind of ressources is a big reason I come on HN. Another recent and good one on the same topic is MIT 6.S191: https://www.youtube.com/watch?v=ySEx_Bqxvvo

spaceman_2020 · on April 5, 2023

A little offtopic, but is this something that someone with only webdev experience can get started with?

whiplash451 · on April 5, 2023

You need some fluency in python and basic knowledge of algebra (matrix multiplication etc.)

If you want, you can also start with the first lessons of course.fast.ai

nomel · on April 5, 2023

Getting started with it takes exactly zero experience. Being productive in it does, but that's unrelated to the starting point, and shouldn't discourage you, if you really want to do it.

There are several open courses, online.

thealchemistdev · on April 5, 2023

Yes. The journey will teach you everything you need to know or it'll kill you. The likelihood of that happening here or any path in IT is negligible at best. So just show up everyday and collect more fuck ups than the next guy (or your former self).

mhh__ · on April 5, 2023

The tools are basically irrelevant conceptually. Its all about the mathematics

jfisher4024 · on April 5, 2023

It might make sense to get a handle on Python first

abhishekjha · on April 5, 2023

I am always lost in these blogs. Is there any gradual progression of understanding/exercises that one can follow to apply ML/NN/DL practically?

Having these disconnected pieces of information with no clear link to one another feels like a lot of noise to me.

Buttons840 · on April 5, 2023

I've been reading The Little Learner, which builds machine learning knowledge on top of Scheme / Racket. After that book, you could watch this series, which will immediately begin explaining how the automatic differentiation works.

porknubbins · on April 6, 2023

I’m almost done with both of the Andrew Ng coursera specializations they are the not at all disconnected or unclear. I don’t think I’ve ever learned so much so quickly in fact.

zakki · on April 5, 2023

Anybody tried the lessons in an Apple MBA/MBP M1/M2? Is it easily applicable?

lacker · on April 6, 2023

It should be completely fine, you don't really need a GPU to do this course. Maybe the exercises right at the end you won't get good performance, but the exact performance isn't all that important.

zakki · on April 6, 2023

Thanks.

sanketskasar · on April 6, 2023

Can other members suggest courses/resources, that help prepare with basic Maths/Stats/ML/Programming skills required for this one?

whiplash451 · on April 5, 2023

Andrej’s course is brilliant and so nice to follow.

His explanation of attention is the most accessible I have ever seen.

fulafel · on April 9, 2023

Are there GPU hardware requirements for the exercises?

Alifatisk · on April 6, 2023

I wonder, how do people visualize their neural network?

adamnemecek · on April 6, 2023

[flagged]

zwaps · on April 6, 2023

You should check out this

https://cats.for.ai/program/

qumpis · on April 6, 2023

How is it relevant here?

adamnemecek · on April 6, 2023

It provides a unifying framework for how NNs work.