There are probably a lot of factors. I worked on CUDA code for around a year, and used to understand the landscape pretty well, but if I were to start a high-performance computing project today I'd probably take my lumps and go with OpenCL. There would be a lot of lumps.
Firstly, CUDA is just more mature; there is a very large and well-established set of libraries for a lot of common operations, there is a decent sized community, and Nvidia even produces specialized hardware (Tesla cards) designed just for CUDA.
Second, all that generic-ness of OpenCL doesn't come for free. With Nvidia, you're just working with one architecture; CUDA cards. Optimizing your kernels is much easier. OpenCL is just generically parallel, so you could have any sort of crazy heterogeneous high-performance computing environment you have to fiddle with (any number of CPU's with different chipsets and any number of GPU's with different chipsets).
I haven't used OpenCL myself, but almost purely anecdotally I have heard many people say that CUDA is often slightly faster[1] and the code is easier to write.
TL;DR: CUDA sacrifices flexibility for ease of development and performance gains. OpenCL wants to be everything for everyone, and comes with the typical burdens.
[1]: Maybe this is a result of OpenCL being more generic and so harder to optimize.
I've been working on a rather large computation library using OpenCL. OpenCL is useful for providing an abstraction over multiple device types. If you are only interested in producing highly-tuned parallel code to execute on NVidia hardware, I suggest sticking to CUDA for the above reasons.
I utilised the OpenCL programming interface to write code that would run the same kernel functions on CPU and/or GPU devices (using heuristics to trade-off latency/throughput) which is something that is not possible afaik using the CUDA toolchain.
FYI regarding highly-tuned code -- An ex ATI/AMD GPU core designer told me that the price you pay for writing optimized code in OpenCL versus the device specific assembler is roughly 3x. Something to keep in mind if you're targeting a large enough system to OpenCL and you find spots that can't be pushed any faster.
Unlike previous versions, OpenCL 2.0 been shown to only be about 30%[1] slower than CUDA and can approach comparable performance given enough optimisation.
Since I am working on code generation of Kernels to perform dynamic tasks, I can't afford to write at the lowest level available. (I'm accelerating Python/Ruby routines though so OpenCL gives a significant bonus without much pain at all.)
> Nvidia is in the slow process of eventually discontinuing further CUDA support, and it is recommended to write new code in OpenCL only.
[Citation needed]
Their OpenCL support is still limited to v1.1 (released in 2010), while just few months ago they've released a new major version of CUDA with tons of features nowhere to be seen in (any vendor's) OpenCL.
Furthermore Python[1], Matlab[2], F#[3]. Furthermore parallel device debuggers (TotalView, Allinea), profilers (NVIDIA). There's a long way for OpenCL to catch up, if ever (because there might be a better standard coming further down the line).
On the contrary, I'd argue that it's not true in specific cases.
"Because it's hard" is a cop-out.
"It's too hard to accomplish given constraint [X]" where X is a deadline, financial constraints, or other real/tangible resource limitations might be one thing. But if you're working on your own timeline on some sort of open-source project, or there is nothing external preventing you from acquiring the expertise/resources to conquer the hard problem, then "Because it's hard" is an absolutely shitty excuse to not do something.
I suggest you read "It's too hard" when written by other developers as, "It's too hard [given that I spend N hours a week on this and would rather actually accomplish something in the next two months than learn the 'right' API]." Or, "It's too hard [given various constraints that I'm not going to explain to you but are valid to me.]" It'll save you having to give speeches about shitty excuses.
That said, if it makes sense for your project, make it happen! :)
Even if the long term goal is more portable GPU support it still makes some sense to get a CUDA implementation up first if it is easier to get to. It then allows real world testing faster, can always go to openCL later once they know more.
Just out of curiosity: how often did you see that happen (not only related to GPU's, but technologie decisions overall)? In my (little) experience the change at a later moment will not happen. Most of the time because the management has a new idea/project which you have to attend to.
It's often a great excuse to do something else instead. If you can't get what you need done without the more difficult option, sure do it. But there's no sense in going down the harder path needlessly.
I'm not trying to convince you you don't need it or shouldn't do it, I was looking for a datapoint about what you find valuable in OpenCL.
Well, the portability can be a killer feature. I've been writing quite a bit of OpenCL code lately. I have an AMD GPU, so CUDA is a non-starter. I'll eventually replace the AMD card with an NVIDIA one, so it won't be as big of a problem, but my OpenCL code will still be fine then.
Firstly, CUDA is just more mature; there is a very large and well-established set of libraries for a lot of common operations, there is a decent sized community, and Nvidia even produces specialized hardware (Tesla cards) designed just for CUDA.
Second, all that generic-ness of OpenCL doesn't come for free. With Nvidia, you're just working with one architecture; CUDA cards. Optimizing your kernels is much easier. OpenCL is just generically parallel, so you could have any sort of crazy heterogeneous high-performance computing environment you have to fiddle with (any number of CPU's with different chipsets and any number of GPU's with different chipsets).
I haven't used OpenCL myself, but almost purely anecdotally I have heard many people say that CUDA is often slightly faster[1] and the code is easier to write.
TL;DR: CUDA sacrifices flexibility for ease of development and performance gains. OpenCL wants to be everything for everyone, and comes with the typical burdens.
[1]: Maybe this is a result of OpenCL being more generic and so harder to optimize.