We recommend using Bfloat16 (not fp16), quantization for small models can really hurt performance!
https://huggingface.co/models?other=base_model:quantized:age...
We recommend using Bfloat16 (not fp16), quantization for small models can really hurt performance!