Yes, AutoGPTQ supports this (8, 4, 3, and 2 bit quantization/"compression" of we...

Yes, AutoGPTQ supports this (8, 4, 3, and 2 bit quantization/"compression" of weights + inference).

GPTQ has also been merged into Transformers library recently ( https://huggingface.co/blog/gptq-integration ).

GGML quantization format used by llama.cpp also supports (8,6,5,4,3, and 2 bit quantization).