Run DeepSeek R1 Dynamic 1.58-bit

danielhanchen · 2025-01-27T22:39:33 1738017573

Oh thanks for sharing this! The fork of llama.cpp for how to do the dynamic quant is here: https://github.com/unslothai/llama.cpp. I also found min_p = 0.05 can help reduce chances of some bad tokens coming up for 1.58bit (I found it to happen around 1/8000 tokens of the time)

homarp · 2025-01-28T09:50:08 1738057808

discussed here https://news.ycombinator.com/item?id=42850222

homarp · 2025-01-27T16:36:05 1737995765

"The 1.58bit quantization should fit in 160GB of VRAM for fast inference"

instruction for llama.cpp: https://huggingface.co/unsloth/DeepSeek-R1-GGUF#instructions...