Hacker News new | past | comments | ask | show | jobs | submit login
HIP – Convert CUDA to Portable C++ (github.com/rocm-developer-tools)
142 points by reimertz on March 18, 2018 | hide | past | favorite | 35 comments



When I write CUDA kernels I am very specific in optimizing for nVidia GPU features:

     * I have 1024 threads per block
     * I have 48KB of shared memory per block
     * I have 32 threads per warp and need make sure that 
       global to local memory reads are coalesced.
     * Think SIMD and avoid branching as much as possible
My kernels usually follow a specific pattern:

     (1) Read global memory into local memory: making sure that
         if thread i reads memory[n], then thread i+1 
         reads memory[n+1].
     (2) __syncthreads().
     (3) Do computation in in most thread balanced way possible.
This very specific pattern doesn't really work elsewhere. In fact, optimizing in this fashion and then porting to C++ and elsewhere loses the specific optimization. Programming in more general way loses all the things that makes the program fast. Anyway, I definitely going to look this over more.


AMD's "HCC" is C++ with templates added onto it. Unlike CUDA, HCC implements all relevant features in C++ Templates alone.

If you simply do 64-thread per "warp" (AMD's groups are per-64) and 32KB LDS aka Shared memory per block, you would be able to write portable high-performance code between AMD GPUs and NVidia GPUs.

AMD seems like its a bit behind with regards to GPGPU adoption. But AMD's hardware seems to be a good bit cheaper. You can get HBM2 models at ~$2000 from AMD (Firepro WX9100, which is Vega architecture)

Although... as they say... hardware is cheap. I'm sure most datacenters will prefer the $8000 NVidia V100 instead, because there are more people using that hardware. In particular, its easier to get started with a V100 due to AWS and other cloud-compute offerings.


It’s portable Cpp in the sense that it’s portable to AMD GPUs, rather than being exclusively comparable with Nvidias Cuda.


Could anyone who's used HIP comment on how it compares to programming in raw CUDA?

I considered using it for a project recently, but ultimately decided against it because I didn't need to be able to run on AMD systems.


Wait ... if you write your program in HIP you get both CUDA and OpenCL (which includes AMD) for free right?


There is no OpenCL in the story there. AMD created HIP as a direct copy of CUDA. It then gets compiled to work on AMD hardware, or can be translated to CUDA.

AMD also supports OpenCL (which I prefer to both CUDA and HIP), but it's not connected to HIP.


Last time I checked, HIP didn’t support reading from textures.

That’s something that’s not only useful for pure graphics.

So it is (or was?) not a straight substitute for CUDA.


I wonder if this performs well enough to port cryptocurrency miner software. I hope to find out with

https://github.com/mimblewimble/grin/issues/806


The fact that HIP exists is why we choose cuda for our program 8 years ago. Opencl and cuda were the choices. I thought about how I would design such a gpu language aND then looked at cuda and opencl. What I dreamt up matched cuda exactly. I forsake one day that it would no longer matter what you wrote in cuda because it could be easily translated to any type of gpu hardware. In the past few years, Portland group made cuda do cpu compiler and now HIP.


*foresee:)


Mmmhhh I would say "foresaw" ;-)


Sent from my Android.


Is there a legal question here as CUDA is proprietary API?


That is an interesting question, but I don't think so, because that repo shouldn't contain anything related to the CUDA binaries.

(Do NOT take my word for it, I have no idea about what I am talking about)


Obviously im thinking of this type of legal issue. https://www.theregister.co.uk/2017/06/09/intel_sends_arm_a_s... . Emulation is simply an implementation of API.


Also Oracle vs Google in their legal battle about the use of Java APIs in Android. https://en.wikipedia.org/wiki/Oracle_America,_Inc._v._Google...

Google won out over Oracle in the end but it took a long time — from 2012 until 2016; four years of court cases — with some courts finding that API structures were copyrightable and others found that they were not, or that reimplementation was fair use. I guess we now have a precedent thanks to this but it could still be an issue? IANAL so I don’t know.


Oracle may not have been able to use SQL anymore if the ruling had gone in their favor.


Oracle haters keep bringing this up, it doesn't make any sense.

SQL is an international standard, that one needs to pay for, it isn't available for free.

Oracle already pays for SQL certifications, unlike Google does for Java.


Could you elaborate???


Somebody speculated that the lawsuit was used by Oracle to establish firm boundaries in what can be considered a copyright infringement, with the intent to implement Amazon/GCE APIs for drop-in compatibility for their cloud offering. So they tried all kinds of ridiculous stuff to see what works/what doesn't in order to properly cover their backs while "stealing" other APIs.


Given that Google helped sink Sun by ripping them off, and Sun wasn't in the position to pay for lawyers, to apply the same medicine that they did to Microsoft....

James Gosling interview, at 57:42

https://www.youtube.com/watch?v=ZYw3X4RZv6Y

"unwilling to help us pays the bills", so nice for the Do No Evil company.

Oracle does pay for their ANSI/ISO SQL certifications.


So, the logical next step would be what? Apply that tool to Tensorflow to make it run on AMD GPUs?


HIP is only useful to convert CUDA source code.

Tensorflow uses the cuDNN library, which is closed source. There is nothing for HIP to convert.


AMD is working on a cudnn comptability layer iirc (MiOpen) and is ROCm group has created a cuda transpiler (to intermediary HIP then to amd binary via hcc)

https://instinct.radeon.com/en/6-deep-learning-projects-amd-...


Wait, I thought that AMD's cuDNN replacement was hipDNN, now I'm confused https://github.com/ROCmSoftwarePlatform/hipDNN

Edit: OK I was reading the docs, I think I got it: hipDNN is a wrapper that (once finished) will search and replace calls (from cuDNN to hipDNN), then hipDNN itself, in turn, will call MIOpen, not sure if that's right, I would appreciate if someone who knows more could confirm


AMD is always working on random half-assed attempts to get to parity with NVidia for neural network training.

If they'd chosen one approach 5 years ago and put decent resources behind it they might be competitive by now.


Yes, but it means AMD will have to do the original work. They can’t use a CUDA to HIP converter.


Was just thinking of that, the tensorflow repo have a very big issue where people are trying to do that


Yup! I spent one very LONG weekend trying to get tensor flow running on my amd GPU. Managed to finally get it running but it was slower than using my CPU ....


Oh NO, that is both hilarious and ugly.

and interesting


It has been in the works for a long time but I don't believe that it is in a usable state yet.

https://github.com/ROCmSoftwarePlatform/hiptensorflow


The combination of tensorflow and codeplay's computeCPP might already work. At least they have added SyCL support for the linear algebra library (Eigen)


I think they also have some TensorFlow-specific SYCL kernels; I'm not sure it all goes through Eigen. Plus they're at TF 1.6, whereas AMD's HIP version seems to be stuck on TF 1.0. AMD's method of maintaining a pure fork and running hipify on every upstream commit doesn't seem very sustainable...


From Codeplay's blogs it seems they ported the tensor part of Eigen to SyCL. The Matrix part is not supported under SyCL. Tensorflow does have options to compile models to custom code (XLA), that part might not use Eigen.


,,HIP is not intended to be a drop-in replacement for CUDA, and developers should expect to do some manual coding and performance tuning work to complete the port.''

It's a great start, but I'm sure a lot of cases are not yet handled (like asm instructions?)




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: