Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The smallest quantized version of the large MoE model on ollama is 143GB:

https://ollama.com/library/qwen3:235b-a22b-q4_K_M

Is there a smaller one?



Running the 3 bit quant of https://huggingface.co/unsloth/Qwen3-235B-A22B-GGUF now on a 128GB macbook.


Smaller quantizations are possible [1], but I think you're right in that you wouldn't want to run anything substantially smaller than 128 GB. Single-GPU on 1x H200 (141 GB) might be feasible though (if you have some of those lying around...)

[1] - https://huggingface.co/unsloth/Qwen3-235B-A22B-GGUF/tree/mai...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: