Yes, it has. I've written a lot of SIMD code and spent a good amount of time reading the compiler assembly output and there has been huge improvement over the last decade.
GCC register allocation wasn't great, then it got better with x86 SSE but still sucked at ARM NEON, and now it seems to be decent with both.
Clang was better at SIMD code before GCC was. It was equally good with SSE and NEON.
In my experience, compilers are much better than humans at instruction scheduling. Especially when using portable vector extensions, you don't have to write the same code twice and then tweak the scheduling for every architecture separately.
> In my experience, compilers are much better than humans at instruction scheduling.
It'd be more accurate to say they're much better than humans when the heuristics or whatever they use works. Sometimes the compiler messes up badly.
The workflow is often to compile and then examine disassembly to see whether the compiler managed to generate something sensible or not.
Other issue is that compiler pattern matching is sometimes not working and generating correct SIMD instruction. Even when data is SIMD width aligned. For example, recently I saw ICC not generating a horizontal add in the most basic scenario imaginable. * shrug *.
Things like this make me question the wisdom of ever using higher level languages. We took the path of abstracting our description of what we want to happen away from processor instructions with the idea that we could write code that could then compile on multiple architectures without changes, but the reality is that we still often need to special case things even without performance considerations, and the farther we abstract the more performance seems to be impacted and the more often we seem to end up jumping through abstraction hoops rather than getting things done.
The minimalist in me wonders if maybe just using some kind of macro system on top of assembler plus a bytecode VM with the ability to drop to native instructions wouldn't ultimately be better.
Yes, it has. I've written a lot of SIMD code and spent a good amount of time reading the compiler assembly output and there has been huge improvement over the last decade.
GCC register allocation wasn't great, then it got better with x86 SSE but still sucked at ARM NEON, and now it seems to be decent with both.
Clang was better at SIMD code before GCC was. It was equally good with SSE and NEON.
In my experience, compilers are much better than humans at instruction scheduling. Especially when using portable vector extensions, you don't have to write the same code twice and then tweak the scheduling for every architecture separately.