Hacker Newsnew | past | comments | ask | show | jobs | submit | ducviet00's commentslogin

Maybe 41.8% is the score of Qwen3-235B-A22B-Thinking-2507, lol. 11% for the non-thinking model is pretty high


Makes sense, it's in line with Gemini 2.5 Pro in that case. It aligns with their other results in the post.


They made it very clear that they were reporting that score for the non-thinking model[0]. I still don't have any guesses as to what happened here, maybe something format related. I can't see a motivation to blatantly lie on a benchmark which would very obviously be publicly corrected.

[0] https://x.com/JustinLin610/status/1947836526853034403


AMD has great hardware, but their software is a different story. It’s poorly documented, unstable, and doesn’t deliver good performance for end users.

I’ve been working with the AMD MI300X for a few weeks, trying to get matrix multiplication running with tools like CK, Triton, or hipBLAS. However, the performance is only about 50% of the theoretical peak (FP16: 650 TFLOPS/s vs. 1300 TFLOPS/s in the whitepaper). Note that this is with matrices initialized to zero. When using random floats, performance drops by 20%—this is confirmed in AMD’s documentation.

Meanwhile, the H100, MI300X’s competitor, has a theoretical FP16 performance of 1000 TFLOPS, and I can achieve 800-900 TFLOPS with matrix multiplication using CUTLASS and random floats initialization.

AMD needs to improve their software quickly if they want to catch up with NVIDIA.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: