They made it very clear that they were reporting that score for the non-thinking model[0]. I still don't have any guesses as to what happened here, maybe something format related. I can't see a motivation to blatantly lie on a benchmark which would very obviously be publicly corrected.
AMD has great hardware, but their software is a different story. It’s poorly documented, unstable, and doesn’t deliver good performance for end users.
I’ve been working with the AMD MI300X for a few weeks, trying to get matrix multiplication running with tools like CK, Triton, or hipBLAS. However, the performance is only about 50% of the theoretical peak (FP16: 650 TFLOPS/s vs. 1300 TFLOPS/s in the whitepaper). Note that this is with matrices initialized to zero. When using random floats, performance drops by 20%—this is confirmed in AMD’s documentation.
Meanwhile, the H100, MI300X’s competitor, has a theoretical FP16 performance of 1000 TFLOPS, and I can achieve 800-900 TFLOPS with matrix multiplication using CUTLASS and random floats initialization.
AMD needs to improve their software quickly if they want to catch up with NVIDIA.