Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> and it only uses ~22Gb (via Ollama) or ~15GB (MLX)

Why is the memory use different? Are you using different context size in both set-ups?



No idea. MLX is its own thing, optimized for Apple Silicon. Ollama uses GGUFs.

https://ollama.com/library/gemma3:27b-it-qat says it's Q4_0. https://huggingface.co/mlx-community/gemma-3-27b-it-qat-4bit says it's 4bit. I think those are the same quantization?


Those are the same quant, but this is a good example of why you shouldn't use ollama. Either directly use llama.cpp, or use something like LM Studio if you want something with a GUI/easier user experience.

The Gemma 3 17b QAT GGUF should be taking up ~15gb, not 22gb.


The vision tower is 7GB, so I was wondering if you were loading it without vision?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: