Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The previous time this article was submitted, I did some calculations based on the charts and found[1] that for the NVIDIA 40 and 50-series GPUs, the results are almost entirely explained by memory bandwidth:

Each of the cards except the 5090 gets almost exactly 0.1 token/s per GB/s memory bandwidth.

My understanding is that the Macs have soldered memory which allows for much higher memory bandwidth. The M4 has ~400-550 GB/s max depending on configuration[2], while EPYCs seem to have more like 250GB/s max[3].

[1]: https://news.ycombinator.com/item?id=42847284

[2]: https://support.apple.com/en-us/121553

[3]: https://www.servethehome.com/here-is-why-you-should-fully-po...



> EPYCs seem to have more like 250GB/s max

Your link goes to info on the 2022 EPYC CPUs, the current generation can do 576GB/s: https://chipsandcheese.com/p/amds-turin-5th-gen-epyc-launche...

Intel's current 12ch Xeons should be even faster with MRDIMMs, though I couldnt find a memory specific benchmark.


Ah shoot, that's what one gets for being in a hurry and on the phone. Saw the date of the article and mention of the EPYC 9004, but forgot that it's the 9005 that's the new series and missed the details.

Thanks for the correction.

edit: found a llama.cpp issue discussing performance bottlenecks on modern dual-socket EPYC here[1]. Also includes single-socket benchmarks, and includes some optimizations. Just thought it was interesting.

[1]: https://github.com/ggml-org/llama.cpp/discussions/11733




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: