Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can't run deepseek-v3/r1 on the RTX Pro 6000, not to mention the upcomming 1 million context qwen models, or the current qwen3-235b.


I can run full deepseek r1 on m1 max with 64GB of ram. Around 0.5 t/s with small quant. Q4 quant of Maverick (253 GB) runs at 2.3 t/s on it (no GPU offload).

Practically, last gen or even ES/QS EPYC or Xeon (with AMX), enough RAM to fill all 8 or 12 channels plus fast storage (4 Gen5 NVMEs are almost 60 GB/s) on paper at least look like cheapest way to run these huge MoE models at hobbyist speeds.


If you're talking about Deepseek r1 with llama.cpp and mmap, then at this point you can run deepseek r1 on a raspberry zero with a 256GB micro sdcard and a phone charger. The only metric left to know is one's patience.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: