Yes and no. The compute density and memory bandwidth is unmatched. But the programming model is markedly worse, even for something like CUDA: you inherently have to think about parallelism, how to organize data, write your kernels in a special language, deal with wacky toolchains, and still get to deal with the CPU and operating system.
There is great power in the convenience of "with open('foo') as f:". Most workloads are still stitching together I/O bound APIs, not doing memory-bound or CPU-bound compute.
CUDA was always harder to program - even if you could get better perf
It took a long time to find something that really took advantage of it, but we did eventually. CUDA enabled deep learning which enabled LLMs . That's history.
What surprised me about the statement was that it implied that the model of python driving optimized GPU kernels was broader than deep learning.
That was the original vision of CUDA - most of the computational work being done by massively parallel cores
There is great power in the convenience of "with open('foo') as f:". Most workloads are still stitching together I/O bound APIs, not doing memory-bound or CPU-bound compute.