How are PCA and SVD related?

twelfthnight · on Aug 24, 2017

For those looking for a more succinct answer: https://stats.stackexchange.com/questions/134282/relationshi...

And here is another interesting connection between PCA and ridge regression: https://stats.stackexchange.com/questions/81395/relationship...

vcdimension · on Aug 24, 2017

I don't understand why people create these webpages just re-explaining stuff that can be read in a book, lecture notes (usually available freely online), or wikipedia. It just adds more noise to the internet. Is it a kind of marketing thing to show their customers that they know what they are doing?

howscrewedami · on Aug 24, 2017

There's value in explaining things in a different/more understandable way. Wikipedia articles and book chapters on statistics can be hard to understand.

marcusshannon · on Aug 24, 2017

Yes it's a form of marketing called inbound marketing. Create content that attracts people (blog posts) and then turn them into leads by getting them to put their email in for more info, etc.

gabrielgoh · on Aug 24, 2017

6 word answer

PCA is the SVD of A'A

lottin · on Aug 24, 2017

Actually it's eigendecomposition of A'A and the SVD of A, is it not?

stephencanon · on Aug 24, 2017

The SVD of A'A is its eigendecomposition (since it is symmetric semi-definite, the two factorizations are the same).

It is closely related to the SVD of A: (USV')'USV' = VSU'USV' = VS^2V'.

thanatropism · on Aug 24, 2017

PCA is a statistical model -- the simplest factor model there is. It deals with variances and covariances in datasets. It returns a transformed dataset that's linearly related to the original one but has the first variable with the highest variance and so on.

SVD is a matrix decomposition. It generalizes the idea of representing a linear transformation (with same dimensions in domain and codomain) in the basis of its eigenvalues, which gives a diagonal matrix representation and a formula like A = V'DV.

SVD is like this, but for rectangular matrices. So you have two matrices to diagonalize: A = U'DV.

That SVD even performs PCA as noted in the algorithms is a theorem, albeit simple one usually given as an exercise. But hey, even OLS regression can be programmed with SVD if you want to.

kiernanmcgowan · on Aug 24, 2017

I've always understood PCA as SVD on a whitened matrix. Is this too simplistic of a view to take wrt implementation?

https://en.m.wikipedia.org/wiki/Whitening_transformation

celerity · on Aug 24, 2017

I actually touch on the relation to whitening toward the bottom of the article. You can whiten your dataset from the left singular matrix U which is directly related to PCs. Thanks for reading!

popcorncolonel · on Aug 24, 2017

The connection between these two has always been hazy to me. I often mixed up the two when talking about each of them independently.

This article was well-written, exactly precise enough, and cleared up the confusion. Thanks for sharing!

eggie5 · on Aug 24, 2017

SVD is the decomposition of a matrix into its components.

PCA is the analysis of a set of eigenvectors. Eigenvectors can come from SVD components or a covariance matrix.

source: http://www.eggie5.com/107-svd-and-pca

foxh0und · on Aug 24, 2017

Great article, the lecture comparing the two from John Hopkins, part of the Data Science Specialization on Coursera also offers a great explanation.

finknotal · on Aug 24, 2017

"Because vectors are typically written horizontally, we transpose the vectors to write them vertically". Is there a typo in this sentence or is to just too early in the morning for me to read this?

zeapo · on Aug 24, 2017

No typo there. When we talk about vectors we mean "column vector". As it's easier to read horizontally (and takes less place in a paper), most of the time we write x^T = {a, b, c} rather than writing them in a column shape.