It looks like this is still missing many matrix operations like QR, SVD, einsum,...

brrrrrm · on Dec 6, 2023

factorization methods are somewhat uncommonly used in deep learning (the likely target of this framework) and have compute properties (such as approximate outputs, non-deterministic number of iterations) that make them unlike the BLAS++ standard APIs.

einsum seems like a reasonable thing to request, but it's hard to be performant across the entire surface exposed by the operation.

thanatosmin · on Dec 6, 2023

Exactly right that this targets a narrower surface to enable many deep learning models. I wonder how uncommon it is to hit some operation that is not included, though? It seems pretty common from a PyTorch MPS tracking issue:

https://github.com/pytorch/pytorch/issues/77764

NVIDIA's moat is not just in providing BLAS++ operations, but extending this to a wider range of cuSPARSE, cuSOLVE, cuTENSOR, etc. Without these, it feels like Apple is just trying to play catch up with whatever is popular and unsupported...