When I was working on a recommender for television shows, I ran SVD on a large U...

jmde · on Nov 30, 2016

I started reading the essay not knowing what to think, and it turned out to be more relevant to my work than I thought.

The issues being discussed in the essay have been a central issue in some area of psychology and behavioral sciences for some time--how to interpret components such as these.

One thought about your "coming into focus at a certain level of compression" comment: I've done some analyses of these vectors as applied to text samples, and one thing that struck me was how unreplicable some of them were across datasets that should be ostensibly similar (but are not the same). Others, in contrast, reappeared across multiple corpora. To the extent some of these components represent "real" features, they should reappear consistently across different datasets where you'd expect them to. That is, they should be robust to changes in idiosyncratic features of the database.

yxhuvud · on Nov 30, 2016

Did you ever compare that focus with a graph over the singular values?

thearn4 · on Nov 30, 2016

It's a good question, FWIW I would expect a reasonably sharp "L" shaped curve in the focus. The assumption there I guess being that this metric of 'focus' is something well characterized by low-frequency type basis matrices given by the first few rows/columns of the SVD's U and V.

numinary1 · on Nov 30, 2016

Exactly what I saw. Your expectation is correct.

gallerdude · on Nov 30, 2016

Computers have us figured out in a way that we don't.

sweetdreamerit · on Nov 30, 2016

A question: is this much better / different than a principal component analysis (or a factor analysis)?

antognini · on Nov 30, 2016

It's a bit of an apples/oranges comparison to compare SVD to PCA. SVD is a numerical technique, whereas PCA is a method to analyze a dataset. You can use SVD to perform PCA (although there are other ways to perform PCA without explicitly doing a SVD). I'm guessing that the GP performed PCA using SVD. There's a good Stack Exchange answer to exactly this question here:

http://stats.stackexchange.com/questions/121162/is-there-any...

zo7 · on Nov 30, 2016

One way to do PCA is using SVD to find a transformation matrix of eigenvectors to project your data with, so they're similar.