Maybe it's just me, but I found matrix calculus atrocious. The rules aren't even useful for most situations, since they're essentially "hardcoded" for certain types of matrix situations that crop up often.
I _much_ prefer to reduce a given matrix expression into einstein summation convention, at which point all of the "regular" calculus rules just work. You can bash it out from this point on.
For example, consider the case of `x^T x`. We are told from matrix calculus that this is `2x`. To do this using summation convention, we first write it in terms of coordinates. We will have:
y = xi xi [summation over i implicit]
dy/dxj
= d(xi^2)/dxj
= d(xi^2)/dxi * dxi/dxj [chain rule]
= 2xi delta(ij) [all xi independent, dxi/dxj = dirac]
= 2xj [summing over i]
dy/dx = 2x
I don't think xTx is is a great example for the demonstration of matrix calculus techniques, because it is a scalar. So, by definition, its derivative with respect to a vector v is going to be the gradient -- a vector of that scalar differentiated by each component of v. This is such a simple case it's not really necessary to introduce matrix calculus at all.
It's when you take derivatives of vectors & matrices by other vectors & matrices that things get "interesting".
I read paper"Matrix calculus you need for deep learning" and if you go through the concepts in blog, it is similar to how pytorch autograd works.
Thank you for yor reply, it makes me to think in different angle. May be I should look into Einstein summation convention and try writing another blog.
Thanks
In general I find that working with mathematics as auxiliary to any field is hard when you do not get to practice it every day. In these cases I have found the matrix cookbook [1] to be very helpful as it serves as a quick reference.
I _much_ prefer to reduce a given matrix expression into einstein summation convention, at which point all of the "regular" calculus rules just work. You can bash it out from this point on.
For example, consider the case of `x^T x`. We are told from matrix calculus that this is `2x`. To do this using summation convention, we first write it in terms of coordinates. We will have: