I hadn't really looked at Julia before, so I started digging, and its really nice! https://julialang.org/
Given this addition of GPGPU required for contemporary compute applications, Julia looks like a compelling alternative to other languages in the space.
> This creates a single kernel call, with no memory allocation or temporary arrays required. Pretty cool – and well out of the reach any other system I know of.
To add to Keno's comment, a bunch of this stuff actually can work with OpenCL and OpenGL, and you can see it in action in GPUArrays.jl [1]. In general, there's a bunch of work to make Julia as hardware-agnostic as possible, but you gotta start somewhere.
The `generic` in the title applies to `kernel` rather than `GPU`. However, since LLVM does have an AMDGPU backend I suspect it's possible to do the same for AMD GPUs as well using the same technique and without changing the frontend code (assuming the AMD driver accepts whatever comes out of the AMDGPU backend). However, I can hardly blame folks for focusing on NVIDIA devices. For better or worse, they're dominating the GPGPU space right now. Just as an example, all the cloud providers have NVIDIA GPUs on offer, but the same is not true for AMD GPUs (though IIRC, Google is planning to offer them in the next generation).
As the author of the underlying framework: the dependency on CUDA is unfortunate indeed, but it was the only viable option at the time. OpenCL tooling was (is) way more fragmented, there's no unified compilation target or back-end, differences between vendors, etc. However, much of this work has dealt with figuring out how to compile Julia for accelerators while integrating with the existing compiler, all of which is transferable to future back-ends. Once the SPIR-V back-end is better integrated with upstream LLVM, we will try to target that toolkit instead.
I maintain https://github.com/thewilsonator/llvm/tree/compute and https://github.com/thewilsonator/llvm-target-spirv which has (unused) tablegen definitions for SPIR-V, for LDC and DCompute.
I'm _hoping_ to get some consensus on backend integration into LLVM trunk at IWOCL in May.
More users of the backend are useful for convincing the LLVM folks of the usefulness of having a single backend in trunk.
Have you had a look at SYCL? I found a nice intro at [1]. Hopefully at some point soon there's an implementation of it for nVidia GPUs, too; right now I've only come across support for Intel and AMD OpenCL using Codeplay's compiler [2]. But it seems quite promising!
I haven't seen SYCL until now, but it looks quite promising. Hopefully Codeplay will keep their interest and investment into that piece of software.
Unfortunately, I don't have the time to wait for NVidia support. I have a window of opportunity to rewrite some core things right now, but that window will soon close.
Given this addition of GPGPU required for contemporary compute applications, Julia looks like a compelling alternative to other languages in the space.