One thing I've always wondered: How do pre-compiled binaries take advantage of new instructions, if they do at all? Since the compiler needs to create a binary that will work on any modern-ish machine, is there a way to use new instructions without breaking compatibility for older CPUs?
Some compilers have a dynamic dispatch for this; you run the "cpuinfo" instruction and check for capability bits, and then dispatch to the version you can support. Some dynamic linkers can even link different versions of a function depending on the CPU capabilities -- gcc has a special __attribute__((__target__()) pragma for this.
The upside is your program may get updated, accelerated routines if it is dynamically linked to a library that you update. The downside is the calls are always indirect via the PLT, which isn't very efficient. It is suitable for things like block encryption and compression where the function entry latency is not very large compared to how long the function runs. It is not very suitable for calls that may be extremely short, like memcmp.
This is where dynamic linking can play a role. For example, Apple's Accelerate framework uses some undocumented matrix instructions on the M1. If you dynamically link the framework (the only supported linkage) you'll automatically get some benefit from these instructions, and future ones, even if those instructions did not exist when you compiled your app.
You’d need to recompile the binary to take advantage of new instructions.
The compiler alone, but also the code, can create branches where the binary checks if certain instructions are available and if they are not, use a less optimal operation.
Backwards compatibility for modern binaries basically. But not forward ability to see the future instructions that haven’t been invented yet.
Not all binaries are fully backwards compatible. If you’re missing AVX, a surprising number of games won’t run. Sometimes only because the launcher won’t run, even though the game plays without AVX.
I've actually sometimes seen this as an argument in favor of JITed languages like C# and Java, that you can take advantage of newer CPU features and instructions etc. without having to recompile. In practice languages that compile to native binaries still win at performance, but it was interesting to see it turned into a talking point.
But for a pure pre-compiled example there is Apple Bitcode which is meant to be compiled to the destination architecture before run (as opposed to JIT code which is compiled when run). It's mandatory for Apple watchOS apps and when they released watch with a 64bit CPU they just recompiled all the apps.
I believe that the binaries that actually get shipped to the watch are final bits. Apple just compiles the bitcode that developers give them into versions for all of the watches that they are supporting, and those versions get downloaded by the actual watches.
If they come out with new watches, then they can re-compile all of the code for the new watches with no developer involvement. It is really the best solution for all involved.
D has a thing where you mark a function as @dynamicCompile, so at the expense of carting the IR and a compiler around you can then use whatever instructions the compiler can detect (i.e. it can be compiled on first use)