Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is a very cool effort that I hadn't heard about, thanks for sharing it!

It's still a large amount of data in the training set compared to what children get (3GB of pure text is many more words than can be said in a lifetime) but it's still a tiny sliver of what GPT-3 was trained on, so it's a very very interesting step in the direction I was thinking of.



You could get away with smaller data by making the model larger, though I don't know how far you can push that before global overfitting. Could make a good Tiny Stories 2: How small can data be before Language Models learn coherent English ?

Still, the paper has me wondering if we could train a physicist model as brilliant as Einstein with much less compute if we curriculumed the data and restricted it to a physics/physics adjacent dataset.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: