Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Run DeepSeek R1 Dynamic 1.58-bit (unsloth.ai)
19 points by amrrs 9 months ago | hide | past | favorite | 3 comments


Oh thanks for sharing this! The fork of llama.cpp for how to do the dynamic quant is here: https://github.com/unslothai/llama.cpp. I also found min_p = 0.05 can help reduce chances of some bad tokens coming up for 1.58bit (I found it to happen around 1/8000 tokens of the time)



"The 1.58bit quantization should fit in 160GB of VRAM for fast inference"

instruction for llama.cpp: https://huggingface.co/unsloth/DeepSeek-R1-GGUF#instructions...




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: