Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's awesome! Are people thinking about training it for more than just 1 epoch? I believe Gallactica showed that training for even 4 epochs is ok. Also, how amazing would be if the next gen of open-source LLM's increased context window, like adding 8k more tokens? That's probably expensive, but totally doable.


The issue with tokens is that they shoot up inference memory usage


Once this barrier is broken down we'll see a lot of cool things. 32k on GPT-4 is already pretty cool but once we get into hundreds of thousands/millions of tokens of context we'll be able to easily do things only currently achievable with fine tuning and "memory" tricks. Assistants that remember everything you've ever told them, asking detailed questions about large datasets, even complex systems that are bootstrapped from the context.


It's including Common Crawl data 4 or 5 times, does that count?




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: