Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In general you can swap B for GB (and use the q8 quantization), so 8GB VRAM can probably just about work.


If you want to not quantize at all, you need to double it for fp16—16GB.


Yes, but I think it's standard to do inference at q8, not fp16.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: