Hacker News new | past | comments | ask | show | jobs | submit login
Deep learning tuning playbook (github.com/google-research)
175 points by tehnub on Jan 20, 2023 | hide | past | favorite | 8 comments



This is great. Another good source is Karpathy's recipe: https://karpathy.github.io/2019/04/25/recipe/

Does anyone know an updated version in the age of Transformers?


No, but you can join his Discord here: https://discord.gg/WmsKzRKC (Link expires in a day to prevent bots)

He's very active, as is the community, and I'm sure either they would know a video or be able to answer any questions you have.


It's a great resource and all but I find insane that hyperparameter tuning isn't more automatized at this point. This artisanal approach doesn't seem scalable.


Optuna is pretty automatized I'd say


Random question from someone who's never touched ML professionally:

Can we optimize hyper-params the same way we optimize weights and biases (e.g gradient descent)? Or would that be too expensive (since you have to optimize weights and biases for each hyper-param configuration)?

Of course that might lead to hyper-hyper-params...


In the general case no because the final metric we care about (eg, accuracy or AUC) is not differentiable (ie, we cannot compute its gradient) with respect to the hyperparameters, especially the discrete ones.

However a recent work at this year's NeurIPS [1] did indeed use an outer gradient descent to tune an inner gradient descent so in some cases yours is indeed a good idea that works.

[1] https://arxiv.org/abs/1909.13371


This looks like a great and really comprehensive resource!

Karpathy’s recipe for training neural nets is one of the only similar documents I’ve come across http://karpathy.github.io/2019/04/25/recipe/

There’s also the (older) devil in the details papers focused on computer vision. I’d love to read something like this on modern methods like transformers. https://arxiv.org/abs/1405.3531


I recommend this video on the subject.

"Let's build GPT: from scratch, in code, spelled out." https://youtu.be/kCc8FmEb1nY




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: