Bandwidth to a GPU from a socketed RAM? How?

reissbaker · 2025-02-26T09:51:09 1740563469

They're using an AMD APU: it has unified RAM and VRAM, much like Apple Silicon. Hence why it needs socketed RAM.

Unified RAM/VRAM is very nice for running LLMs locally, since you can get wayyyy more RAM than you typically can get VRAM on discrete GPUs. 128GB VRAM on discrete GPUs is 4x5090s — aka $8k just on GPU spend alone. This is $2k and it includes the CPU!

Of course, it'll be somewhat slower than a discrete GPU setup, but at a quarter of the cost, that's a reasonable tradeoff for most people I'd think. It should run Llama 3.1 70b (or various finetunes/LoRAs) quite easily, even with reasonably long context.

maeln · 2025-02-26T09:47:13 1740563233

Correct me if I am wrong, but I believe this is an AMD SoC, so a combo of a CPU+GPU (and TPU/AI engine, whatever you wanna call it) on the same chip. And they do share the RAM.