Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Memory. I don't know the equation, but its very easy to see when you load a 128k context model at 8K vs 80K. The quant I am running would double VRAM requirements when loading 80K.


This was my understanding too. Would love more people to chime in on the limits and costs of larger contexts.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: