Hacker News new | past | comments | ask | show | jobs | submit login

For example, look at OpenAI's latest paper on scaling Transformers "Scaling Laws for Neural Language Models" https://arxiv.org/abs/2001.08361 , Kaplan et al 2020:

larger models are better, in the entire range they test up to billion-parameter models, in pretty much every way - they need hardly any additional data, achieve lower losses, they train faster, they can be parallelized better, and they're even more compute-efficient and sample-efficient (!).




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: