Allegedly GCC support GPU (ptx in particular, for NVIDIA) offloading. I don't know whether the performance is competitive and whether it can be used to speed-up fortran co-arrays (which I, as a non-fortran programmer, would expect to be the way that functionality would be made available to fortran). OpenMP should work as well.