Hacker News new | past | comments | ask | show | jobs | submit login

How are you running inference? GPU or CPU? I'm trying to use GPT4All (ggml-based) on 32 cores of E5-v3 hardware and even the 4GB models are depressingly slow as far as I'm concerned (i.e. slower than the GPT4 API, which is barely usable for interactive work). I'd be much obliged if you could point me at a specific quantized model on HF that you think is "fast" and I'll download it and try it out.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: