BLAS/LAPACK don't do any block level optimizations. Heck, they don't even let you define a fixed block sparsity pattern. Do the math yourself and write down all 16 sparsity patterns for a 2x2 block matrix and try to find the inverse or LU decomposition on paper.
I mean just look at the saddle point problem you mentioned in that section. It's a block matrix with highly specific properties and there is no BLAS call for that. Things get even worse once you have parameterized matrices and want to operate on a series of changing and non-changing matrix multiplications. Some parts can be factorized offline.
https://lukefleed.xyz/posts/cache-friendly-low-memory-lanczo...
I mean just look at the saddle point problem you mentioned in that section. It's a block matrix with highly specific properties and there is no BLAS call for that. Things get even worse once you have parameterized matrices and want to operate on a series of changing and non-changing matrix multiplications. Some parts can be factorized offline.