about the scaling of llama.cpp and DeepSeek on some dual-socket AMD systems.
While it was rather tricky, after many experiments they have obtained an almost double speed on two sockets, especially on AMD Turin.
However, if you look at the actual benchmark data, that must be much lower than what is really possible, because their test AMD Turin system (named there P1) had only two thirds of the memory channels populated, i.e. performance limited by memory bandwidth could be increased by 50%, and they had 16-core CPUs, so performance limited by computation could be increased around 10 times.
https://github.com/ggml-org/llama.cpp/discussions/11733
about the scaling of llama.cpp and DeepSeek on some dual-socket AMD systems.
While it was rather tricky, after many experiments they have obtained an almost double speed on two sockets, especially on AMD Turin.
However, if you look at the actual benchmark data, that must be much lower than what is really possible, because their test AMD Turin system (named there P1) had only two thirds of the memory channels populated, i.e. performance limited by memory bandwidth could be increased by 50%, and they had 16-core CPUs, so performance limited by computation could be increased around 10 times.