Hacker News new | past | comments | ask | show | jobs | submit login

We showed this in our original scaling law paper:

https://arxiv.org/pdf/1712.00409.pdf

The slope of the power law is determined by the problem and dataset. Compute, parameter count, and data move you along the curve. Change in architecture/bias is a constant offset.

So architecture can give an advantage, but that advantage can be overcome by scale.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: