> Why should I care about different forms of matrix decomposition? What do they buy me?
A natural line of questioning to go down once you're acquainted with linear maps/matrices is "which functions are linear"/"what sorts of things are linear functions capable of doing?"
It's easy to show dot products are linear, and not too hard to show (in finite dimensions) that all linear functions that output a scalar are dot products. And these things form a vector space themselves, the "dual space" (because each element is a dot-product mirror of some vector from the original space). So linear functions from F^n -> F^1 are easy enough to understand.
What about F^n -> F^m? There's rotations, scaling, projections, permutations of the basis, etc. What else is possible?
A structure/decomposition theorem tells you what is possible. For example, the Jordan Canonical Form tells you that with the right choice of basis (i.e. coordinates), matrices all look like a group of independent "blocks" of fairly simple upper triangle matrices that operate on their own subspaces. Polar decomposition says that just like complex numbers can be written in polar form re^it, where multiplication scales by r and rotates by t, so can linear maps be written as a higher dimensional multiplication/scaling and orthogonal transformation/"rotation". The SVD says that given the correct choice of basis for the source and image, linear maps all look like multiplication on independent subspaces. The coordinate change for SVD is orthogonal, so another interpretation is that roughly speaking, SVD says all linear maps are a rotation, scaling, and another rotation. The singular vectors tell you how space rotates and the singular values tell you how it stretches.
So the name of the game becomes to figure out how to pick good coordinates and track coordinate changes, and once you do this, linear maps become relatively easy to understand.
Dual spaces come up as a technical thing when solving PDEs for example. You look for "distributional" solutions, which are dual vectors (considering some vector space of functions). In that context people talk about "integrating a distribution with test functions", which is the same thing as saying distributions are dot products (integration defines a dot product) aka dual vectors. There's some technical difficulties here though because now space is infinite dimensional, and not all dual vectors are dot products, e.g. the Dirac delta distribution delta(f) = f(0) can't be written as a dot product <g,f> for any g, but it is a limit of dot products (e.g. with taller/thinner gaussians). One might ask whether all dual vectors are limits of dot products and whether all limits of dual vectors are dual vectors (as limits are important when solving differential equations). The dual space concept helps you phrase your questions.
They also come up a lot in differential geometry. The fundamental theorem of calculus/Stokes theorem more-or-less says that differentiation is the adjoint/dual to the map that sends a space to its boundary. I don't know off the top of my head of more "elementary" examples. It's been like 10 years since I've thought about "real" engineering, but roughly speaking, dual vectors model measurements of linear systems, so one might be interested in studying the space of possible systems (which, as in the previous paragraph, might satisfy some linear differential equations). My understanding is that quantum physics uses a dual space as the state space and the second dual as the space of measurements, which again seems like a fairly technical point that you get into with infinite dimensions.
Note that there's another factoring theorem called the first isomorphism theorem that applies to a variety of structures (e.g. sets, vector spaces, groups, rings, modules) that says that structure-preserving functions can be factored into a quotient (a sort of projection) followed by an isomorphism followed by an injection. The quotient and injection are boring; they just collapse your kernel to zero without changing anything else, and embed your image into a larger space. So the interesting things to study to "understand" linear maps are isomorphisms, i.e. invertible (square) matrices. Another way to say this is that every rectangular matrix has a square matrix at its heart that's the real meat.
A natural line of questioning to go down once you're acquainted with linear maps/matrices is "which functions are linear"/"what sorts of things are linear functions capable of doing?"
It's easy to show dot products are linear, and not too hard to show (in finite dimensions) that all linear functions that output a scalar are dot products. And these things form a vector space themselves, the "dual space" (because each element is a dot-product mirror of some vector from the original space). So linear functions from F^n -> F^1 are easy enough to understand.
What about F^n -> F^m? There's rotations, scaling, projections, permutations of the basis, etc. What else is possible?
A structure/decomposition theorem tells you what is possible. For example, the Jordan Canonical Form tells you that with the right choice of basis (i.e. coordinates), matrices all look like a group of independent "blocks" of fairly simple upper triangle matrices that operate on their own subspaces. Polar decomposition says that just like complex numbers can be written in polar form re^it, where multiplication scales by r and rotates by t, so can linear maps be written as a higher dimensional multiplication/scaling and orthogonal transformation/"rotation". The SVD says that given the correct choice of basis for the source and image, linear maps all look like multiplication on independent subspaces. The coordinate change for SVD is orthogonal, so another interpretation is that roughly speaking, SVD says all linear maps are a rotation, scaling, and another rotation. The singular vectors tell you how space rotates and the singular values tell you how it stretches.
So the name of the game becomes to figure out how to pick good coordinates and track coordinate changes, and once you do this, linear maps become relatively easy to understand.
Dual spaces come up as a technical thing when solving PDEs for example. You look for "distributional" solutions, which are dual vectors (considering some vector space of functions). In that context people talk about "integrating a distribution with test functions", which is the same thing as saying distributions are dot products (integration defines a dot product) aka dual vectors. There's some technical difficulties here though because now space is infinite dimensional, and not all dual vectors are dot products, e.g. the Dirac delta distribution delta(f) = f(0) can't be written as a dot product <g,f> for any g, but it is a limit of dot products (e.g. with taller/thinner gaussians). One might ask whether all dual vectors are limits of dot products and whether all limits of dual vectors are dual vectors (as limits are important when solving differential equations). The dual space concept helps you phrase your questions.
They also come up a lot in differential geometry. The fundamental theorem of calculus/Stokes theorem more-or-less says that differentiation is the adjoint/dual to the map that sends a space to its boundary. I don't know off the top of my head of more "elementary" examples. It's been like 10 years since I've thought about "real" engineering, but roughly speaking, dual vectors model measurements of linear systems, so one might be interested in studying the space of possible systems (which, as in the previous paragraph, might satisfy some linear differential equations). My understanding is that quantum physics uses a dual space as the state space and the second dual as the space of measurements, which again seems like a fairly technical point that you get into with infinite dimensions.
Note that there's another factoring theorem called the first isomorphism theorem that applies to a variety of structures (e.g. sets, vector spaces, groups, rings, modules) that says that structure-preserving functions can be factored into a quotient (a sort of projection) followed by an isomorphism followed by an injection. The quotient and injection are boring; they just collapse your kernel to zero without changing anything else, and embed your image into a larger space. So the interesting things to study to "understand" linear maps are isomorphisms, i.e. invertible (square) matrices. Another way to say this is that every rectangular matrix has a square matrix at its heart that's the real meat.