Hi, one of the lead authors for this work. We recommend using Bfloat16 (not fp16...

CamperBob2 · 2025-02-11T21:35:32 1739309732

Have you compared it to the 1.58 bit dynamic quant model based on the original R1 (i.e., not a distillation)? Whatever unsloth did, it doesn't seem to be giving up much reasoning performance over the full Q8 version.

mluo · 2025-02-11T22:17:05 1739312225

It's simply bc the model is small (1.5B), making it sensitive to weight perturbations

simonw · 2025-02-11T21:36:35 1739309795

Is there a GGUF version of your model anywhere that you recommend? I'm on a Mac.

mluo · 2025-02-11T22:16:04 1739312164

Think there are some people who made GGUFs as branches of our model, try it out!

https://huggingface.co/models?other=base_model:quantized:age...

newman314 · 2025-02-12T20:01:40 1739390500

Is there a MLX version that can be added to the fullmoon iOS app?