FFmpeg is the most successful of such projects and it uses handwritten assembly. Ignore the seductive whispers of people trying to sell you unreliable abstractions. It's like this for good reasons.
Or you just say that my code is only fast on hardware that supports nativ AVX512 (or whatever). In many cases where speed really matters that is a reasonable tradeoff to make.
True.
So most projects just use SIMDe, xSIMD or something similar for such use cases?