Really cool blog. I'll add it to the other good ones I typically read: flowingdata.com and datawrangling.com.
I wonder why he chose to use PMF. There are seemingly less complicated algorithms (PCA/SVD/LSA) that appear to do the same type of thing.
It also looks like PMF uses expectation maximization, which is quite common but not necessarily scalable (although I could be wrong). This would probably take quite a bit of time to run on the whole dataset and unfortunately he doesn't give any specs!
SVD doesn't work when you have a sparse matrix as input. You could use some sort of imputation method (like EM) to fill in all of the missing values then run SVD, but this would be extremely inefficient, since only a tiny fraction of the user x movies matrix of ratings is observed. It also probably wouldn't work very well, because your true signal from the data would be overwhelmed by all of the missing values that you'd somewhat arbitrarily filled in.
PMF is a generalization of SVD that is able to work directly on the sparse structure of the input. It does gradient descent -- modifying the low dimension latent vectors directly to minimize error, and never does any imputation of missing values. This makes it very efficient, and it can easily handle millions of ratings. See http://www.cs.toronto.edu/~amnih/papers/pmf.pdf for an application of PMF to the very large Netflix problem.
Another reason he uses PMF is probably because Danny Tarlow (post author) is a student at UToronto. Russ Salakhutdinov and Andrei Mnih, the authors of the PMF method, are also students at UToronto.
I wonder why he chose to use PMF. There are seemingly less complicated algorithms (PCA/SVD/LSA) that appear to do the same type of thing.
It also looks like PMF uses expectation maximization, which is quite common but not necessarily scalable (although I could be wrong). This would probably take quite a bit of time to run on the whole dataset and unfortunately he doesn't give any specs!