Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

VRAM is what takes a model from "can not run at all" to "can run" (even if slowly), hence the emphasis.


No, with limited VRAM you could offload the model partially or split across CPU and GPU. And since CPU has swap, you could run the absolute largest model. It’s just really really slow.


Really, really, really, really, really, REALLY REALLY slow.


The difference between Deepseek-r1:70b (edit: actually 32b) running on an M4 Pro (48 GB unified RAM, 14 CPU cores, 20 GPU cores) and on an AMD box (64 GB DDR4, 16 core 5950X, RTX 3080 with 10 GB of RAM) is more than a factor of 2.

The M4 pro was able to answer the test prompt twice--once on battery and once on mains power--before the AMD box was able to finish processing.

The M4's prompt parsing took significantly longer, but token generation was significantly faster.

Having the memory to the cores that matter makes a big difference.


You're adding detail that's not relevant to anything I said. I was saying this statement:

> VRAM is what takes a model from "can not run at all" to "can run" (even if slowly), hence the emphasis.

Is false. Regardless of how much VRAM you have, if the criteria is "can run even if slowly", all machines can run all models because you have swap. It's unusably slow but that's not what OP was claiming the difference is.


The criteria for purchase for anybody trying to use it is "run slowly but acceptably" vs. "run so slow as to be unusable".

My memory is wrong, it was the 32b. I'm running the 70b against a similar prompt and the 5950X is probably going to take over an hour for what the M4 managed in about 7 minutes.

edit: an hour later and the 5950 isn't even done thinking yet. Token generation is generously around 1 token/s.

edit edit: final statistics. M4 Pro managing 4 tokens/s prompt eval, 4.8 tokens/s token generation. 5950X managing 150 tokens/s prompt eval, and 1 token/s generation.

Perceptually I can live with the M4's performance. It's a set prompt, do something else, come back sort of thing. The 5950/RTX3080's is too slow to be even remotely usable with the 70b parameter model.


I don't disagree. I'm just taking OP at the literal statement they made.


Sure, this is technically correct, but somewhere there's a line of practicality. Running off a CPU (especially with swap) will be past that line.

Otherwise, you don't even need a computer. Pen and paper is plenty.

For all practical purposes, VRAM is a limiting factor.


You can say the same about GPU clock speed as well…




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: