GPTQ has also been merged into Transformers library recently ( https://huggingface.co/blog/gptq-integration ).
GGML quantization format used by llama.cpp also supports (8,6,5,4,3, and 2 bit quantization).
GPTQ has also been merged into Transformers library recently ( https://huggingface.co/blog/gptq-integration ).
GGML quantization format used by llama.cpp also supports (8,6,5,4,3, and 2 bit quantization).