Hacker News new | past | comments | ask | show | jobs | submit login

It’s really hard to compare optimizers. Common architectures and default hyperparameters were discovered alongside Adam so you’d have to redo a bunch of sweeps if you wanted a “fair” comparison. In practice this doesn’t really matter and everyone just uses Adam. If you had infinite compute, you’d try every combo and select the one with the best results.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: