Yes and no. The compute density and memory bandwidth is unmatched. But the progr...

gdiamos · 2024-04-17T16:42:29 1713372149

CUDA was always harder to program - even if you could get better perf

It took a long time to find something that really took advantage of it, but we did eventually. CUDA enabled deep learning which enabled LLMs . That's history.

What surprised me about the statement was that it implied that the model of python driving optimized GPU kernels was broader than deep learning.

That was the original vision of CUDA - most of the computational work being done by massively parallel cores