The units being used here are likely different. In most press releases Nvidia us...

The units being used here are likely different. In most press releases Nvidia uses their "tensor core" performance, usually with either sparsity or 16 bit data. A single A100 is said to have 320 teraflops of "tensor float" performance but only 19 teraflops of "normal" full FP32 performance.

This is way out of my field so I don't know the whole implications, but my understanding is Nvidia cards cam only reach these speeds at the loss of precision or full functionality, so it's an apples to oranges comparison versus non-nvidia chips.