If you're compiling hand tuned assembly, -march and -mtune probably will have more effect when compared to compiling and optimizing C/C++ code.
OTOH, I'd like to underline that heavily optimized scientific code and libraries are neither naive (in terms of algorithmic complexity/implementation) nor straightforward :D
OTOH, I'd like to underline that heavily optimized scientific code and libraries are neither naive (in terms of algorithmic complexity/implementation) nor straightforward :D
E.g.: This is how Eigen configures its internal vectorization parameters: https://gitlab.com/libeigen/eigen/-/blob/master/Eigen/src/Co...