Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Googling for what you ask has found immediately this discussion:

https://github.com/ggml-org/llama.cpp/discussions/11733

about the scaling of llama.cpp and DeepSeek on some dual-socket AMD systems.

While it was rather tricky, after many experiments they have obtained an almost double speed on two sockets, especially on AMD Turin.

However, if you look at the actual benchmark data, that must be much lower than what is really possible, because their test AMD Turin system (named there P1) had only two thirds of the memory channels populated, i.e. performance limited by memory bandwidth could be increased by 50%, and they had 16-core CPUs, so performance limited by computation could be increased around 10 times.



Cool, I didn’t find that one! Thanks.

A single 192 core Epyc is 11k by itself, so I’d probably go for the simpler integrated M3 ultra solution…




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: