Hacker News new | past | comments | ask | show | jobs | submit login

Yes, AutoGPTQ supports this (8, 4, 3, and 2 bit quantization/"compression" of weights + inference).

GPTQ has also been merged into Transformers library recently ( https://huggingface.co/blog/gptq-integration ).

GGML quantization format used by llama.cpp also supports (8,6,5,4,3, and 2 bit quantization).




'other than'...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: