This is a very cool effort that I hadn't heard about, thanks for sharing it! It'...

famouswaffles · on May 24, 2023

You could get away with smaller data by making the model larger, though I don't know how far you can push that before global overfitting. Could make a good Tiny Stories 2: How small can data be before Language Models learn coherent English ?

Still, the paper has me wondering if we could train a physicist model as brilliant as Einstein with much less compute if we curriculumed the data and restricted it to a physics/physics adjacent dataset.