> I can't wait to see someone testing the full DeepSeek model on this at 819 GB ...

coder543 · 2025-03-05T15:01:53 1741186913

DeepSeek-R1 only has 37B active parameters.

A back of the napkin calculation: 819GB/s / 37GB/tok = 22 tokens/sec.

Realistically, you’ll have to run quantized to fit inside of the 512GB limit, so it could be more like 22GB of data transfer per token, which would yield 37 tokens per second as the theoretical limit.

It is likely going to be very usable. As other people have pointed out, the Mac Studio is also not the only option at this price point… but it is neat that it is an option.

mrtksn · 2025-03-05T14:51:26 1741186286

How many t/s would you expect? I think I feel perfectly fine when its over 50.

Also, people figured a way to run these things in parallel easily. The device is pretty small, I think for someone who wouldn't mind the price tag stacking 2-3 of those wouldn't be that bad.

yk · 2025-03-05T15:06:22 1741187182

I think I've seen 800 GB/s memory bandwidth, so a q4 quant of a 400 B model should be 4 t/s if memory bound.

behnamoh · 2025-03-05T14:55:31 1741186531

I know you’re referring to the exolabs app, but the t/s is really not that good. it uses thunderbolt instead of NVlink.

bearjaws · 2025-03-05T16:00:06 1741190406

Not sure why you are being downvoted, we already know the performance numbers due to memory bandwidth constraints on the M4 Max chips, it would apply here as well.

525GB/s to 1000GB/s will double the TPS at best, which is still quite low for large LLMs.

lanceflt · 2025-03-05T18:50:54 1741200654

Deepseek R1 (full, Q1) is 14t/s on an M2 Ultra, so this should be around 20t/s