The original gemma3:27b also took only 22GB using Ollama on my 64GB MacBook. I'm quite confused that the QAT took the same. Do you know why?
Which model is better? `gemma3:27b`, or `gemma3:27b-qat`?
Both versions are quantized and should use the same amount of RAM, the difference with QAT is the quantization happens during training time and it should result in slightly better (closer to the bf16 weights) output
Just following this comment up as a note-to-self: just as `kgwgk noted, the default gemma3:27B model has ID a418f5838eaf, which corresponds to 27b-it-q4_K_M. But the new gemma3:27B quantization-aware training (QAT) model being discussed is gemma3:27b-it-qat with ID 29eb0b9aeda3.
Quantization aware training just means having the model deal with quantized values a bit during training so it handles the quantization better when it is quantized after training/etc. It doesn't change the model size itself.