I don't know. I'd use CUDA rather than experimenting with it; CUDA seems to be w...

I don't know. I'd use CUDA rather than experimenting with it; CUDA seems to be where all the investment is and my experience with OpenCL on AMD was so bad I don't feel the cross-platform argument would make any sense.

It is a bit of a chicken-egg problem for me. I couldn't make OpenCL work on an AMD GPU, so I didn't manage to learn that much about OpenCL. At the time I assumed it was just me, but in hindsight I never saw an OpenCL-based approach to a compute problem that worked reliably on my machine so maybe it wasn't.

But I don't think it really matters. The algorithms in the field don't seem to be hard and I never felt like I was struggling when implementing them on the CPU without any special API at all. My issues were conceptually similar to George Hotz's famous rants where he had crashes when running the demo app in a loop. In the experimenting phase I found I couldn't run code on the GPU with any API.

I'm sure the situation improved and part of it was just me; towards the end of my time with AMD I could run stable diffusion inference and it'd work great for 10-40 minutes before the kernel paniced or whatever - so it was definitely technically possible to get a "hello world++" style thing running. But I never felt it was the APIs that were holding me back.