Hacker News new | past | comments | ask | show | jobs | submit login
How are PCA and SVD related? (intoli.com)
139 points by celerity on Aug 23, 2017 | hide | past | favorite | 15 comments



For those looking for a more succinct answer: https://stats.stackexchange.com/questions/134282/relationshi...

And here is another interesting connection between PCA and ridge regression: https://stats.stackexchange.com/questions/81395/relationship...


I don't understand why people create these webpages just re-explaining stuff that can be read in a book, lecture notes (usually available freely online), or wikipedia. It just adds more noise to the internet. Is it a kind of marketing thing to show their customers that they know what they are doing?


There's value in explaining things in a different/more understandable way. Wikipedia articles and book chapters on statistics can be hard to understand.


Yes it's a form of marketing called inbound marketing. Create content that attracts people (blog posts) and then turn them into leads by getting them to put their email in for more info, etc.


6 word answer

PCA is the SVD of A'A


Actually it's eigendecomposition of A'A and the SVD of A, is it not?


The SVD of A'A is its eigendecomposition (since it is symmetric semi-definite, the two factorizations are the same).

It is closely related to the SVD of A: (USV')'USV' = VSU'USV' = VS^2V'.


PCA is a statistical model -- the simplest factor model there is. It deals with variances and covariances in datasets. It returns a transformed dataset that's linearly related to the original one but has the first variable with the highest variance and so on.

SVD is a matrix decomposition. It generalizes the idea of representing a linear transformation (with same dimensions in domain and codomain) in the basis of its eigenvalues, which gives a diagonal matrix representation and a formula like A = V'DV.

SVD is like this, but for rectangular matrices. So you have two matrices to diagonalize: A = U'DV.

That SVD even performs PCA as noted in the algorithms is a theorem, albeit simple one usually given as an exercise. But hey, even OLS regression can be programmed with SVD if you want to.


I've always understood PCA as SVD on a whitened matrix. Is this too simplistic of a view to take wrt implementation?

https://en.m.wikipedia.org/wiki/Whitening_transformation


I actually touch on the relation to whitening toward the bottom of the article. You can whiten your dataset from the left singular matrix U which is directly related to PCs. Thanks for reading!


The connection between these two has always been hazy to me. I often mixed up the two when talking about each of them independently.

This article was well-written, exactly precise enough, and cleared up the confusion. Thanks for sharing!


SVD is the decomposition of a matrix into its components.

PCA is the analysis of a set of eigenvectors. Eigenvectors can come from SVD components or a covariance matrix.

source: http://www.eggie5.com/107-svd-and-pca


Great article, the lecture comparing the two from John Hopkins, part of the Data Science Specialization on Coursera also offers a great explanation.


"Because vectors are typically written horizontally, we transpose the vectors to write them vertically". Is there a typo in this sentence or is to just too early in the morning for me to read this?


No typo there. When we talk about vectors we mean "column vector". As it's easier to read horizontally (and takes less place in a paper), most of the time we write x^T = {a, b, c} rather than writing them in a column shape.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: