Deep learning tuning playbook

pfd1986 · on Jan 20, 2023

This is great. Another good source is Karpathy's recipe: https://karpathy.github.io/2019/04/25/recipe/

Does anyone know an updated version in the age of Transformers?

de_nied · on Jan 20, 2023

No, but you can join his Discord here: https://discord.gg/WmsKzRKC (Link expires in a day to prevent bots)

He's very active, as is the community, and I'm sure either they would know a video or be able to answer any questions you have.

NeutralForest · on Jan 20, 2023

It's a great resource and all but I find insane that hyperparameter tuning isn't more automatized at this point. This artisanal approach doesn't seem scalable.

gkbrk · on Jan 20, 2023

Optuna is pretty automatized I'd say

brap · on Jan 20, 2023

Random question from someone who's never touched ML professionally:

Can we optimize hyper-params the same way we optimize weights and biases (e.g gradient descent)? Or would that be too expensive (since you have to optimize weights and biases for each hyper-param configuration)?

Of course that might lead to hyper-hyper-params...

blackbear_ · on Jan 20, 2023

In the general case no because the final metric we care about (eg, accuracy or AUC) is not differentiable (ie, we cannot compute its gradient) with respect to the hyperparameters, especially the discrete ones.

However a recent work at this year's NeurIPS [1] did indeed use an outer gradient descent to tune an inner gradient descent so in some cases yours is indeed a good idea that works.

[1] https://arxiv.org/abs/1909.13371

rsfern · on Jan 20, 2023

This looks like a great and really comprehensive resource!

Karpathy’s recipe for training neural nets is one of the only similar documents I’ve come across http://karpathy.github.io/2019/04/25/recipe/

There’s also the (older) devil in the details papers focused on computer vision. I’d love to read something like this on modern methods like transformers. https://arxiv.org/abs/1405.3531

amrb · on Jan 20, 2023

I recommend this video on the subject.

"Let's build GPT: from scratch, in code, spelled out." https://youtu.be/kCc8FmEb1nY