thanks for that! so it sounds like, if i purchase a chip that supports avx512, and run an operating system and compiler that supports avx512, i can write "plain old c code" with a minimal amount of compiler arguments and compile that code on my machine (aka not just running someone else's binary). and then the full power of avx512 is right there waiting for me? :)
A compiler turning C(++) code into SIMD instructions is called "autovectorization". In my experience this works for simple loops such as dot products (even that requires special compiler flags to enable FMA and reorders), but unfortunately the wheels often fall off for more complex code.
Also, I haven't seen the compiler generate the more exotic instructions.