I also trained nanoGPT on TinyStories, produced about a 32M model. The results are amazing, especially considering I opted for a character-level model similar to the toy dataset in the repo. I’m writing about the experience while also doing a deep dive into the code on medium (username oaguy1). Smaller LLMs are definitely worth considering with the right quality training data. Once I finish playing with TinyStories, I recently tweaked the Standardized Project Gutenberg Corpus (~11GB) to be more modern. Want to see what I can do with it with nanoGPT and then maybe Huggingface’s libraries.