Hacker News new | past | comments | ask | show | jobs | submit login

I’d love to run this on a single 24gb 3090 - how much dram / SSD space do I need for a decent LLM, when it’s quantised to 4bits?



I've been trying this, and with compression on (4 bits) you can fit the entire 30B model on the 3090.


OK so don't need offloading at all for the quantised model - nice.

In practice, how good is the 30B model vs 175B?


I don't have access to 175B for comparison. In a vacuum, 30B isn't very good. In the neighborhood of GPT-NeoX-20B, I think, but not good. It repeats itself easily and has a tenuous relationship with the topic. It's still much better than anything I could run locally before now.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: