Hacker News new | past | comments | ask | show | jobs | submit login

Bandwidth to a GPU from a socketed RAM? How?



They're using an AMD APU: it has unified RAM and VRAM, much like Apple Silicon. Hence why it needs socketed RAM.

Unified RAM/VRAM is very nice for running LLMs locally, since you can get wayyyy more RAM than you typically can get VRAM on discrete GPUs. 128GB VRAM on discrete GPUs is 4x5090s — aka $8k just on GPU spend alone. This is $2k and it includes the CPU!

Of course, it'll be somewhat slower than a discrete GPU setup, but at a quarter of the cost, that's a reasonable tradeoff for most people I'd think. It should run Llama 3.1 70b (or various finetunes/LoRAs) quite easily, even with reasonably long context.


Correct me if I am wrong, but I believe this is an AMD SoC, so a combo of a CPU+GPU (and TPU/AI engine, whatever you wanna call it) on the same chip. And they do share the RAM.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: