The expectation in the HPC community is that an interested vendor will provide their own BLAS/LAPACK implementation (MKL is a BLAS/LAPACK implementation, along with a bunch of other stuff), which is well-tuned for their hardware. These sort of libraries aren't just tuned for an architecture, they might be tuned for a given generation or even particular SKUs.
I learned about this recently when trying to optimize ML test architecture running on Azure. It turns out having access to Ice Lake chips would allow optimizations that should decrease compute time and therefore cost by 20-30%.
Each vendor. Intel BLAS (MKL) has Intel-specific optimizations and AMD BLAS has AMD-specific optimizations.
Intel is still acting in bad faith by allowing MKL to run in crippled mode on AMD. They should either let it use all available instructions or make it refuse to run.
The latest oneMKL versions have sgemm/dgemm kernels for Zen CPUs that are almost as fast as the AVX2 kernels (that require disabling Intel CPU detection on Zen).