Interesting. I'll take your anecdote for what its worth.
My personal use case with OpenCL didn't seem to be going very well. I was testing on my personal Rx 290x. While I didn't have the crashing / infinite loop bugs (See LuxRender's "Dave" for details: http://www.luxrender.net/forum/viewtopic.php?f=34&t=11009) that other people had, my #1 issue was with the quality of AMD's OpenCL compiler.
In particular, the -O2 flag would literally break my code. I was doing some bit-level operations, and those bit-level operations were just wrong under -O2. While the -O0 flag was so unoptimized that my code was regularly swapping registers into / out of global memory. At which point the CPU was faster at executing and there was no point in using OpenCL / GPU compute.
It seems like AMD's OpenCL implementation assumes that the kernels would be very small and compact. And it seems to be better designed for floating-point ops. Other programmers online have also complained about AMD's bit-level operations returning erronious results under -O2. My opinion of its compiler was... rather poor... based on my limited exposure. And further research seems to indicate that I wasn't the only one having these issues.
Only did and do floating point for image processing. In fact, looking into my logs, I registered 5 bugs with NVidia in the last 2 years, none with AMD.