OpenCL is not a disaster at all. It is just that NVidia were (and still are) too...

aseipp · on April 26, 2018

> It is also pushed a lot by Intel for FPGA, which probably scare even more NVidia.

These tools aren't available for the wide majority of developers, and are still exceptionally difficult to use and maintain without hardware engineers. I'm going to assume you haven't used FPGAs at all? The ones that can compete at the same tasks for GPUs are not as easily available in terms of price, volume, or even over-the-counter availability (be prepared to ask for a lot of quotes), and the tools have only become more accessible very recently -- such as Intel slashing the FPGA OpenCL licensing costs, and Dell EMC shipping them in pre-configured rack units.

> Nvidia only introduced the possibility of having runtime compilation as a preview in Cuda 7

In the mean time, Nvidia also completely dominated the market by actually producing many working middleware libraries and integrations, a solid and working programming model, and continuously refining and delivering on core technology and GPU upgrades. Maybe those things matter more than runtime compilation and speculative claims about peak performance...

> The "single source" argument is completely overrated.

Even new Khronos standards like SYCL (built on OpenCL, and which does look promising, and I'm hoping AMD delivers a toolchain after they get MIOpen more fleshed out) are moving to the single-source model. It's not even that much better, really, but development friction and cost of entry matters more than anything, and Nvidia understood this from day one. They understood it with GameWorks, as well. They plant specialist engineers "in the field" to accelerate the development and adoption of their tech, and they're very good at it.

This is because their core focus is hardware and selling hardware; it's thus in their interest to release "free" tools that require low-effort to buy into, do as much dirty integration work as possible, and basically give people free engineering power -- because it drives their hardware sales. They basically subsidize their software stack in order to drive GPUs.

> Furthermore, you can have single source in OpenCL putting the code in strings.

This is a joke argument, right?

dragontamer · on April 26, 2018

> OpenCL is not a disaster at all.

I'll probably need to be more specific. OpenCL 1.0 through 1.2 is fine, but fell hopelessly behind NVidia's CUDA efforts. NVidia CUDA has more features that lead to proven performance enhancements.

OpenCL 2.0 was the "counterpunch" to bring OpenCL up to CUDA-level features. However, OpenCL 2.0 is virtually stillborn. Only Intel and AMD platforms support OpenCL2.0. Intel Xeon Phi are relatively niche (and their primary advantage seems to be x86 code compatibility anyway. So I doubt you'd be running OpenCL on them).

AMD OpenCL 2.0 support exists, but is rather poor. The OpenCL 2.0 debugger simply is non-functional and you're forced to use lol printfs.

That leaves OpenCL 1.2. Its okay, but it is years behind what modern hardware can do. Its atomic + barrier model is strange compared to proper C++11 Atomics, its missing important features like device-side queuing, shared virtual memory, unified address space (no more copy/paste code just to go from "local" to "private" memory), among other very useful features.

> Even today OpenCL is a viable solution for GPU

OpenCL 1.2 is a viable solution. An old, crusty, and quirky solution, but viable nonetheless. OpenCL 2.0+ is basically dead. And I think only Intel Xeon Phi supports the latest OpenCL 2.2.

I bet you there are more Vulkan compute shaders out there than there are OpenCL 2.0. Indeed, there are rumors that the Khronos project is going to be focusing on Vulkan compute shaders in the future.

> The "single source" argument is completely overrated. Furthermore, you can have single source in OpenCL putting the code in strings.

I like my compile-time errors to be during compile-time. Not during run-time on my client's system. Compiler-bugs in AMD drivers are fixed through device driver updates (!!!) which makes practical deployment of plain-text OpenCL source code far more of a hassle in practice.

Consider this horror story: a compiler bug in some AMD Device Driver versions which cause a segfault on some hardware versions. This is not theoretical: https://community.amd.com/thread/160362.

In practice, deploying OpenCL 1.2 code requires you to test all of the device drivers your client base is reasonably expected to run.

-----

But that's not the only issue.

"Single Source" means that you can define a singular structure in a singular .h file and actually have it guaranteed to work between CPU-code and GPU-code. Data-sharing code is grossly simplified and is perfectly matched.

The C++ AMP model (which has been adopted into AMD's ROCm platform) is grossly superior. You specify a few templates and bam, your source code automatically turns into CPU code OR GPU-code. Extremely useful when sharing routines between the CPU and GPU (like data-packing or unpacking from the buffers)

With that said, AMD clearly cares about OpenCL and the ROCm platform looks like it strongly supports OpenCL through then near term, especially OpenCL 1.2 which seems to have a big codebase.

However, if I were to do any project these days, I'd do it in ROCm's HCC / single-source C++ system or CUDA. OpenCL 1.2 is useful for high-compatibility but has major issues as an environment.

se6 · on April 26, 2018

The point I really wanted to make here is that OpenCL is only a disaster because NVidia was scared of the competition it would bring from AMD.

dragontamer · on April 26, 2018

I'm sure NVidia deserves some blame.

But AMD drivers which cause OpenCL compiler-segfaults and/or infinite loops is a problem that rests squarely on AMD's shoulders.

se6 · on April 26, 2018

I have extensively used OpenCL on both AMD and NVidia for a few years and never had such problems. If anything, found a few more bugs with NVidia.

dragontamer · on April 26, 2018

Interesting. I'll take your anecdote for what its worth.

My personal use case with OpenCL didn't seem to be going very well. I was testing on my personal Rx 290x. While I didn't have the crashing / infinite loop bugs (See LuxRender's "Dave" for details: http://www.luxrender.net/forum/viewtopic.php?f=34&t=11009) that other people had, my #1 issue was with the quality of AMD's OpenCL compiler.

In particular, the -O2 flag would literally break my code. I was doing some bit-level operations, and those bit-level operations were just wrong under -O2. While the -O0 flag was so unoptimized that my code was regularly swapping registers into / out of global memory. At which point the CPU was faster at executing and there was no point in using OpenCL / GPU compute.

It seems like AMD's OpenCL implementation assumes that the kernels would be very small and compact. And it seems to be better designed for floating-point ops. Other programmers online have also complained about AMD's bit-level operations returning erronious results under -O2. My opinion of its compiler was... rather poor... based on my limited exposure. And further research seems to indicate that I wasn't the only one having these issues.

se6 · on April 27, 2018

Only did and do floating point for image processing. In fact, looking into my logs, I registered 5 bugs with NVidia in the last 2 years, none with AMD.