>Of course, we are very far away from any experiment of this kind, given that current models actually require much more examples of language than any human has ever been given for an entire lifetime.
Doesn't take much to imbue grammar, coherent completions and basic reasoning.
This is a very cool effort that I hadn't heard about, thanks for sharing it!
It's still a large amount of data in the training set compared to what children get (3GB of pure text is many more words than can be said in a lifetime) but it's still a tiny sliver of what GPT-3 was trained on, so it's a very very interesting step in the direction I was thinking of.
You could get away with smaller data by making the model larger, though I don't know how far you can push that before global overfitting. Could make a good Tiny Stories 2: How small can data be before Language Models learn coherent English ?
Still, the paper has me wondering if we could train a physicist model as brilliant as Einstein with much less compute if we curriculumed the data and restricted it to a physics/physics adjacent dataset.
Doesn't take much to imbue grammar, coherent completions and basic reasoning.
https://arxiv.org/abs/2305.07759