Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Try different quantization variations. I got vastly different speeds depending on which quantization I chose. I believe q4_0 worked very well for me. Although for a 7B model q8_0 runs just fine too with better quality.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: