ducviet00's comments

ducviet00 · 2025-07-25T14:22:07 1753453327

Maybe 41.8% is the score of Qwen3-235B-A22B-Thinking-2507, lol. 11% for the non-thinking model is pretty high

jug · 2025-07-25T14:38:49 1753454329

Makes sense, it's in line with Gemini 2.5 Pro in that case. It aligns with their other results in the post.

christianqchung · 2025-07-25T15:49:42 1753458582

They made it very clear that they were reporting that score for the non-thinking model[0]. I still don't have any guesses as to what happened here, maybe something format related. I can't see a motivation to blatantly lie on a benchmark which would very obviously be publicly corrected.

[0] https://x.com/JustinLin610/status/1947836526853034403

ducviet00 · 2024-11-27T17:51:36 1732729896

AMD has great hardware, but their software is a different story. It’s poorly documented, unstable, and doesn’t deliver good performance for end users.

I’ve been working with the AMD MI300X for a few weeks, trying to get matrix multiplication running with tools like CK, Triton, or hipBLAS. However, the performance is only about 50% of the theoretical peak (FP16: 650 TFLOPS/s vs. 1300 TFLOPS/s in the whitepaper). Note that this is with matrices initialized to zero. When using random floats, performance drops by 20%—this is confirmed in AMD’s documentation.

Meanwhile, the H100, MI300X’s competitor, has a theoretical FP16 performance of 1000 TFLOPS, and I can achieve 800-900 TFLOPS with matrix multiplication using CUTLASS and random floats initialization.

AMD needs to improve their software quickly if they want to catch up with NVIDIA.