Each CPU vendor or each CPU architecture? (genuinely asking, I don't know how it...

bee_rider · on April 6, 2022

The expectation in the HPC community is that an interested vendor will provide their own BLAS/LAPACK implementation (MKL is a BLAS/LAPACK implementation, along with a bunch of other stuff), which is well-tuned for their hardware. These sort of libraries aren't just tuned for an architecture, they might be tuned for a given generation or even particular SKUs.

hallway_monitor · on April 6, 2022

I learned about this recently when trying to optimize ML test architecture running on Azure. It turns out having access to Ice Lake chips would allow optimizations that should decrease compute time and therefore cost by 20-30%.

bee_rider · on April 6, 2022

Some AVX-512 stuff I guess?

AVX-512 had a rough rollout, but it seems like it is finally turning into something nice.

wmf · on April 6, 2022

Each vendor. Intel BLAS (MKL) has Intel-specific optimizations and AMD BLAS has AMD-specific optimizations.

Intel is still acting in bad faith by allowing MKL to run in crippled mode on AMD. They should either let it use all available instructions or make it refuse to run.

microtonal · on April 6, 2022

The latest oneMKL versions have sgemm/dgemm kernels for Zen CPUs that are almost as fast as the AVX2 kernels (that require disabling Intel CPU detection on Zen).