How are you running inference? GPU or CPU? I'm trying to use GPT4All (ggml-based...

How are you running inference? GPU or CPU? I'm trying to use GPT4All (ggml-based) on 32 cores of E5-v3 hardware and even the 4GB models are depressingly slow as far as I'm concerned (i.e. slower than the GPT4 API, which is barely usable for interactive work). I'd be much obliged if you could point me at a specific quantized model on HF that you think is "fast" and I'll download it and try it out.