Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hi, one of the lead authors for this work.

We recommend using Bfloat16 (not fp16), quantization for small models can really hurt performance!



Have you compared it to the 1.58 bit dynamic quant model based on the original R1 (i.e., not a distillation)? Whatever unsloth did, it doesn't seem to be giving up much reasoning performance over the full Q8 version.


It's simply bc the model is small (1.5B), making it sensitive to weight perturbations


Is there a GGUF version of your model anywhere that you recommend? I'm on a Mac.


Think there are some people who made GGUFs as branches of our model, try it out!

https://huggingface.co/models?other=base_model:quantized:age...


Is there a MLX version that can be added to the fullmoon iOS app?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: