Compiling Julia for NVIDIA GPUs

unfamiliar · on Feb 3, 2015

Since I only have one GPU usually, I'm still waiting for the day I can do this:

    using CUDA

    # define a kernel
    @kernel function kernel_vadd(a, b, c)
        i = blockId_x() + (threadId_x()-1) * numBlocks_x()
        c[i] = a[i] + b[i]
    end

    # create some data
    dims = (3, 4)
    a = round(rand(Float32, dims) * 100)
    b = round(rand(Float32, dims) * 100)
    c = Array(Float32, dims)

    # execute!
    @cuda kernel_vadd(CuIn(a), CuIn(b), CuOut(c))

    # verify
    @show a+b == c

i.e. no set up, no "cuda context" (whatever that is), and no tear down afterwards. I understand that manual memory management is almost necessary with this application, but it seems that most of it could be automated in the most common case of "I have a couple of large arrays and a few operations I want to perform."

Lofkin · on Feb 3, 2015

Writing backend agnostic code (CPU and GPU) jitted to fast LLVM is on the roadmap for python's Numba: https://github.com/numba/numba/issues/545

Currently supports both including multithreading. Its just he unified interface that is in the works.

jtth · on Feb 3, 2015

You can do that in Mathematica. http://www.wolfram.com/products/cuda-opencl-programming-math...

tabacof · on Feb 3, 2015

Torch7 has amazing GPU support. If you wanna send a tensor (array) to the GPU, you just type array:cuda(). All operations you do from now on will be on the GPU.

More involved example using a neural network: http://code.cogbits.com/wiki/doku.php?id=tutorial_cuda

th0ma5 · on Feb 3, 2015

I have seen a couple of Clojure based GPU compilers, although they seemed to be sort of proof-of-concepts, and I'm not sure how general purpose they were. It would be nice to possibly get out of the mode of it being a whole other computer and compiler, but maybe that distinction is for the best for the time being.

ICWiener · on Feb 4, 2015

Instead of @kernel and @cuda, you might want to use defkernel and with-cuda... See the example files for cl-cuda:

https://github.com/takagi/cl-cuda/blob/master/examples/vecto...

I though that Julia had macros, so I don't understand why what you propose is not possible (note to self: find time to learn Julia).

maleadt · on Feb 4, 2015

It is definitely possible to do that in Julia. The reason a didn't yet is purely a manner of priorities, I first focused on wrapping the basic primitives (calling a kernel, marshalling arguments, etc) in a user-friendly way.

zevets · on Feb 3, 2015

Have you tried thrust? For a limited but very useful set of operations, it is nearly that painless, as long as you don't mind a generous helping of C++ template boilerplate.

http://thrust.github.io/

For single GPU stuff, I find myself writing the performance critical kernels in cuda, and using thrust for nearly all the easy stuff.

kfor · on Feb 3, 2015

The content of the project aside (which is very exciting but early stage), I'm really impressed with the author's overview for how others can take up the project and move it forward. I've seen too many projects die because when the authors move onto other things they just leave an incomplete git repo and no clear plan for what happens next. It'd be great of course if there were someone lined up to take the reins, but the crucial thing is that in this case someone could ascertain the project's state and possible next steps even months after maleadt is out of the picture.

maleadt · on Feb 3, 2015

Thanks for the kind words! This was exactly what I was aiming for: the code (or insights) to be reusable without too much hassle. More so because part of it was developed in the scope of my PhD; I wouldn't want to know how many failed or unpublishable research results are stowed away on some grad student's computer.

e12e · on Feb 4, 2015

How does this compare/contrast vs opencl.jl[1]?

When working in julia, what are the benefits of tying oneself to CUDA (and not running accelerated on on-die graphics or on amd gpus) -- or doesn't nvidia work reliably/well with opencl?

[1] https://github.com/JuliaGPU/OpenCL.jl

maleadt · on Feb 4, 2015

OpenCL.jl is purely the runtime part, ie. it still requires you to write manual OpenCL code, after which you can use the julia wrapper to manage that code.

My project also provides compiler support for lowering Julia to CUDA assembly, so you don't need to write CUDA code yourself. Added to that, my runtime also contains (PoC) higher-level wrappers, making it easier to call CUDA kernels, upload data, etc.

Concerning tying yourself to the NVIDIA-stack: it's still the most mature and versatile toolchain, which is why I picked it in the first place. My long term plan was to switch over to SPIR (or some other cross-vendor stack) as soon as possible. At that point, switching user-code over to that new back-end would (theoretically) not require that much effort, since the kernels are written in julia-code instead of CUDA C (except for the runtime interactions, of course).

e12e · on Feb 4, 2015

Thank you for emphasising the part about kernels being Julia code -- I missed that entirely!

As for Nvidia/CUDA being more mature -- that was what I feared -- it seems a common sentiment in the discussions I've seen on OpenCL/CUDA.

monochromatic · on Feb 3, 2015

Don't forget to use the /3.5GB compiler option if it's for a GTX 970!