I agree with your comment to some extent. I'm kinda allergic to writing loops, even in C++ where it's the best solution.
On the other hand, in C++, hand-rolled matrix multiplication is both slower and an order of magnitude less accurate than MKL (or possibly OpenBLAS too).
On the other hand, in C++, hand-rolled matrix multiplication is both slower and an order of magnitude less accurate than MKL (or possibly OpenBLAS too).