The original gemma3:27b also took only 22GB using Ollama on my 64GB MacBook. I'm...

zorgmonkey · 2025-04-20T23:44:39 1745192679

Both versions are quantized and should use the same amount of RAM, the difference with QAT is the quantization happens during training time and it should result in slightly better (closer to the bf16 weights) output

kgwgk · 2025-04-20T20:39:50 1745181590

Look up 27b in https://ollama.com/library/gemma3/tags

You'll find the id a418f5838eaf which also corresponds to 27b-it-q4_K_M

carbocation · 2025-04-29T22:46:55 1745966815

Just following this comment up as a note-to-self: just as `kgwgk noted, the default gemma3:27B model has ID a418f5838eaf, which corresponds to 27b-it-q4_K_M. But the new gemma3:27B quantization-aware training (QAT) model being discussed is gemma3:27b-it-qat with ID 29eb0b9aeda3.

superkuh · 2025-04-20T20:50:34 1745182234

Quantization aware training just means having the model deal with quantized values a bit during training so it handles the quantization better when it is quantized after training/etc. It doesn't change the model size itself.

nolist_policy · 2025-04-20T19:05:40 1745175940

I suspect your "original gemma3:27b" was a quantized model since the non-quantized (16bit) version needs around 54gb.