Assuming you're only using the left matrix once, surely the action of transposin...

hddqsb · on Jan 26, 2022

Matrix multiplication is O(n^3), while transposing is O(n^2). So for a large matrix, transposing will only take a tiny fraction of the time spent in multiplication. (But the loop reordering trick mentioned in @mynameismon's link is a very nice alternative.)

kanaffa12345 · on Jan 26, 2022

you don't transpose it before the matmul, you always have it transposed (i.e., when you print the weights of a linear layer in pytorch, you're actually seeing (A^t)^t and what's stored is A^t.