At my workplace, we were reluctant in making the choice between writing OpenCL a...

At my workplace, we were reluctant in making the choice between writing OpenCL and being AMD-compliant, but missing out on CUDA features and tooling; and writing CUDA and being vendor-locked.

Our jerry-rigged solution for now is writing kernels that are the same source for both OpenCL and CUDA, with a few macros doing a bit of adaptation (e.g. the syntax for constructing a struct). This requires no special library or complicated runtime work - but it does have the downside of forcing our code to be C'ish rather than C++'ish, which is quite annoying if you want to write anything that's templated.

Note that all of this regards device-side, not host-side, code. For the host-side, I would like, at some point, to take the modern-C++ CUDA API wrappers (https://github.com/eyalroz/cuda-api-wrappers/) and derive from them something which supports CUDA, OpenCL and maybe HIP/ROCm. Unfortunately, I don't have the free time to do this on my own, so if anyone is interested in collaborating on something like that, please drop me a line.

-----

You can find the OpenCL-that-is-also-CUDA mechanism at:

https://github.com/eyalroz/gpu-kernel-runner/blob/main/kerne...

and

https://github.com/eyalroz/gpu-kernel-runner/blob/main/kerne...

(the files are provided alongside a tool for testing, profiling and debugging individual kernels outside of their respective applications.)