What matters for Qwen models, and most/all local MoE models (ie. where the perfo...

embedding-shape · 2026-03-08T17:28:26 1772990906

For comparison, most recent (consumer) NVIDIA GPUs released:

- 5050 - MSRP: 249 USD - 320 GB/s

- 5060 - MSRP: 299 USD - 448 GB/s

- 5060 Ti - MSRP: 379 USD - 448 GB/s

- 5070 - MSRP: 549 USD - 672 GB/s

- 5070 Ti - MSRP: 749 USD - 896 GB/s

- 5080 - MSRP: 999 USD - 960 GB/s

- 5090 - MSRP: 1999 USD - 1792 GB/s

M3 Ultra seems to come close to a ~5070 Ti more or less.

seanmcdirmid · 2026-03-08T18:00:27 1772992827

You should really list memory with the graphics cards, and above should list (unified) memory and prices as well with particular price points.

embedding-shape · 2026-03-08T19:13:26 1772997206

I mean what I was curious (and maybe others) about was comparing it to parent's post, which is all about the memory bandwidth, hence the comparison.

seanmcdirmid · 2026-03-08T22:15:13 1773008113

But it doesn't matter if you have 1000GB/s memory bandwidth if you only have 32GB of vram. Well, maybe for some applications it works out (image generation?), but its not seriously competing with an ultra with 128 GB of unified memory or even a max with 64 GB if unified memory.

embedding-shape · 2026-03-09T14:03:00 1773064980

> but its not seriously competing with an ultra with 128 GB of unified memory or even a max with 64 GB if unified memory.

No one is arguing that either, this sub-thread is quite literally about the memory bandwidth. Of course there are more things to care about in real-life applications of all this stuff, again, no one is claiming otherwise. My reply was adding additional context to the "What matters [...] is memory bandwidth" parent comment, nothing more, hence the added context of what other consumer hardware does in memory bandwidth.

seanmcdirmid · 2026-03-09T22:19:04 1773094744

If we are talking about Apple silicon, where we can configure the memory separately from the bandwidth (and the memory costs the same for each processor), we can say something like "its all about bandwidth". If we switch to GPUs where that is no longer true, NVIDIA won't let you buy an 5090 with more 32GB of VRAM, then...we aren't comparing apples to apples anymore.

ranger_danger · 2026-03-09T05:13:39 1773033219

A 10GB 3080 still beats even an M2 Ultra with 192GB... memory bandwidth is not the only factor.

https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inferen...

seanmcdirmid · 2026-03-09T07:07:23 1773040043

If the model is small enough to fit in to 10GB of VRAM the GPU can win.

But the bigger models are more useful, so that’s what people fixate on.

spatular · 2026-03-08T19:13:20 1772997200

There is also prompt processing that's compute-bound, and for agentic workflows it can matter more than tg, especially if the model is not of "thinking" type.