Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's misleading to compare a desktop GPU against a data center GPU on these metrics. Blackwell data center tenor cores are different from Blackwell consumer tensor cores, and same for the AMD side.

Also, the size of the native / atomic matrix fragment size isn't relevant for memory bandwidth because you can always build larger matrices out of multiple fragments in the register file. A single matrix fragment is read from memory once and used in multiple matmul instructions, which has the same effect on memory bandwidth as using a single larger matmul instruction.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: