> Even most common extensions just cover part of them.
The headers I linked to here: https://news.ycombinator.com/item?id=8873764 Cover most useful Intel CPU extensions. And you can use them easily in code, it just sucks when you have to do CPU feature detection and have different code paths for different CPUs.
> it just sucks when you have to do CPU feature detection and have different code paths for different CPUs.
In the Linux kernel they went so far as to actually replacing instructions at boot-time at known locations, based on the capabilities of the CPU the code is executing on. This prevents them from maintaining many paths in the compiled code.
Games on consoles, where the hardware architecture is frozen and you can optimize to the metal all you want.
Not all titles do this, since most titles are cross-platform and you're not going to do tons of optimization for a specific one unless there is a huge payoff.
I've used a lot of SSE2 and prefetch instincs in Visual C++ for math / data heavy high performance computing applications. It can make a fairly big different -- like more than 50% performance improvements for key parts of the code.
That depends heavily on with CPU and memory subsystem combination you're running it on. You need to know how far to prefetch and how to order your data in memory.
Many of the capabilities described in the article are just hardware features which don't need any control from application developers, but their effectiveness can be improved if the developers are aware of them (e.g. caches, TLBs). Others properties are exploited automatically by compilers (e.g. instruction scheduling for improving throughput in a superscalar processor)
Instruction set extensions (which I believe might be what you were thinking of) are used via intrinsics and assembly programming by pretty much any performance-oriented code. For example look at the dozens of architecture-specific implementations of glibc functions: x86-64[0], arm[1].
Hence why even if C looks like a portable high level Assembler, there are many modern CPU features not available.
Even most common extensions just cover part of them.