Being a C++ dev, I think you're on-point about the reason people still choose C++ but my suspicion is that it part truth, part group-think and it won't really hold out.
I could be totally wrong, but have you had to optimize a tight loop in C/C++? It sucks and a lot of stuff is missing : simd, likely/unlikely branches, the compiler has a lot of trouble knowing when lines of code are independent and can be done in parallel (b/c const =/= immutable), in-lining can't be forced when you know it gives better performance (if you use GCC extensions you can alleviate a lot of this..but that's tying you to a compiler). Yeah C++ is generally faster than Go, but there is a lot of performance typically left on the table. The language kinda start to work against you when you start to dig down. So you slap some shit together and the compiler will do a decent job, but most C++ devs aren't even looking at the compiler output
But the elephant in the room is that more and more of our available flops are on the GPU and C++ isn't helping you there at all. Not only that, but the GPU is giving you way more operations per Watt (and that's what a lot of those people care about). And finally, when you throw stuff onto the GPU you are also leaving the CPU available to do other things. So there are a lot of "wins" there. As you illustrate, the areas of C++'s relevance is shrinking, and shrinking into the area that is very GPU friendly.
So the way I see it, C++ folks will start to write more OpenCL kernels for the performance critical pieces and the rest won't matter (Go or Clojure or whatever). The GLSL is kinda lame and too C99ish... so maybe someone will write a better lang that compiles to SPIR-V, and it's not exactly write-once-run-everywhere, but it could be much better than writing optimized C++ and it can run everywhere. It's more of the cross-platform-assembly C/C++ wants to be
Most compilers have extensions that will allow you to do this (__builtin_expect and so on).
> in-lining can't be forced when you know it gives better performance
Again, most compilers have this, not just GCC, e.g. __forceinline.
> the compiler has a lot of trouble knowing when lines of code are independent and can be done in parallel (b/c const =/= immutable)
This is true, as aliasing is a real issue. The hardware itself has some say over this anyway, dependent on its instruction scheduling and OOE capabilities.
What you don't mention, however, is the fact that almost no other languages offer any of these, let alone all of them. Rust may be the exception here, although some of this is still in the words (SIMD, I'm not sure about the status of likely/unlikely intrinsics).
For GPU programming, if you're using CUDA, you're almost certainly using C or C++, or calling something that wraps C/C++ code. Not everything is suited to GPU processing anyway, there's still a lot of code that's not moving off the CPU any time soon that needs to be performant.
right, so things that are not part of the language, not crossplatform and not crosscompiler. That's called fighting the language in my book :)
I'm not saying you can't get C++ to output the assembly you want - it just sucks trying to coerce it to do things that are honestly not that complicated. And even when you do get what you want you find you can't use the code anywhere else. To me that feels like a language failure...
> is the fact that almost no other languages offer any of these
I guess you missed my point. It seems to me that we're at a point where you no longer need these features as part of your core application language. The idea is that with OpenCL/SPIR-V we'll be able to
1- be more explicit and not fight the language (so even if you're 100% on the CPU it makes sense)
2- target every platform (you can finally write code for your GPU)
3- can be called from any parent language
You're right that not all performance critical problems boil down to tight shared-memory loops that can be thrown onto an OpenCL kernel - but my experience so far tells me that that's the vast majority of performance problems. So C++'s usefulness will shrink. But maybe my experience is biased and I'm off base. I haven't done much OpenCL myself - but I'm definitely planning to use it more in the future
> right, so things that are not part of the language, not crossplatform and not crosscompiler
You just have a header with different #defines for the different platforms you are going to ship on, or use a premade open source one.
If you want to ship on everything, you won't get full optimization stuff everywhere. It would be better if some of these features were in the standard, but in practice it isn't such a big issue for those two in particular.
These are all good points, but I'd say two things:
1. Whatever C++'s weaknesses in this area are, it's superior to Go, so C++ programmers aren't going to switch to Go because of this.
2. Not everything is about raw throughput. You can't do anything latency sensitive on a GPU. Consider a game: the pixels get drawn on the GPU, and the physics might happen on the GPU, but you still have a ton of highly latency sensitive things that are going to have to be done by the CPU, such as input handling and networking. Also, even with low driver overhead APIs like Vulkan, you still have to have something on the CPU telling the GPU what to do. Finally, GPUs aren't good at branch-heavy workloads in general.
I could be totally wrong, but have you had to optimize a tight loop in C/C++? It sucks and a lot of stuff is missing : simd, likely/unlikely branches, the compiler has a lot of trouble knowing when lines of code are independent and can be done in parallel (b/c const =/= immutable), in-lining can't be forced when you know it gives better performance (if you use GCC extensions you can alleviate a lot of this..but that's tying you to a compiler). Yeah C++ is generally faster than Go, but there is a lot of performance typically left on the table. The language kinda start to work against you when you start to dig down. So you slap some shit together and the compiler will do a decent job, but most C++ devs aren't even looking at the compiler output
But the elephant in the room is that more and more of our available flops are on the GPU and C++ isn't helping you there at all. Not only that, but the GPU is giving you way more operations per Watt (and that's what a lot of those people care about). And finally, when you throw stuff onto the GPU you are also leaving the CPU available to do other things. So there are a lot of "wins" there. As you illustrate, the areas of C++'s relevance is shrinking, and shrinking into the area that is very GPU friendly.
So the way I see it, C++ folks will start to write more OpenCL kernels for the performance critical pieces and the rest won't matter (Go or Clojure or whatever). The GLSL is kinda lame and too C99ish... so maybe someone will write a better lang that compiles to SPIR-V, and it's not exactly write-once-run-everywhere, but it could be much better than writing optimized C++ and it can run everywhere. It's more of the cross-platform-assembly C/C++ wants to be