That's awesome! Are people thinking about training it for more than just 1 epoch? I believe Gallactica showed that training for even 4 epochs is ok. Also, how amazing would be if the next gen of open-source LLM's increased context window, like adding 8k more tokens? That's probably expensive, but totally doable.
Once this barrier is broken down we'll see a lot of cool things. 32k on GPT-4 is already pretty cool but once we get into hundreds of thousands/millions of tokens of context we'll be able to easily do things only currently achievable with fine tuning and "memory" tricks. Assistants that remember everything you've ever told them, asking detailed questions about large datasets, even complex systems that are bootstrapped from the context.