The 4090 offers 82.58 teraflops of single-precision performance compared to the ...

adrian_b · 2024-12-24T16:12:46 1735056766

On the other hand, for double precision a Radeon Pro VII is many times faster than a RTX 4090 (due to 1:2 vs. 1:64 FP64:FP32 ratio).

Moreover, for workloads limited by the memory bandwidth, a Radeon Pro VII and a RTX 4090 will have about the same speed, regardless what kind of computations are performed. It is said that speed limitation by memory bandwidth happens frequently for ML/AI inferencing.

ryao · 2024-12-25T01:39:35 1735090775

Double precision is not used in either inference or training as far as I know.

adrian_b · 2024-12-25T11:34:36 1735126476

Even the single precision given by the previous poster is seldom used for inference or training.

Because the previous poster had mentioned only single precision, where RTX 4090 is better, I had to complete the data with double precision, where RTX 4090 is worse, and memory bandwidth where RTX 4090 is the same, otherwise people may believe that progress in GPUs over 5 years has been much greater than it really is.

Moreover, memory bandwidth is very relevant for inference, much more relevant than FP32 throughput.

llm_trw · 2024-12-27T06:23:01 1735280581

For people wondering:

Titan V: 7.8 TFLOPs

AMD Radeon Pro VII: 6.5 TFLOPs

AMD Radeon VII: 3.52 TFLOPs

4090: 1.3 TFLOPs

llm_trw · 2024-12-24T22:50:23 1735080623

For inference sure, for training: no.