Hacker News new | past | comments | ask | show | jobs | submit login
The $25B Eigenvector: The Linear Algebra behind Google (2006) (siam.org)
77 points by tosh 9 months ago | hide | past | favorite | 18 comments



I just learned about this in the Mining Massive Datasets course at Stanford.

The gist is that you can formulate the PageRank problem as a flow equation r = Mr, where r is a vector of PageRanks and M is a stochastic adjacency matrix, which can be interpreted as the probability of visiting one page from another.

This matrix is huge and hence this is tough to solve.

But since M is a stochastic matrix, it has the property that its largest eigenvalue is exactly 1, and all other eigenvalues are < 1. And the equation r = Mr can be interpreted as r being the eigenvector of M with eigenvalue 1, as this equation implies multiplying by M is equivalent to multiplying by 1.

Repeatedly multiplying by M hence shrinks every eigenvector except the eigenvector with eigenvalue 1. So you start with some initialization of r, and repeatedly multiply by M until it converges. The resulting value of r solves r = Mr since all other eigenvectors shrank towards zero.

This is a much more efficient way to compute r, and hence makes PageRank computationally feasible.


PageRank's linear algebraic formulation also has a natural interpretation in the GraphBLAS sparse graph linear algebra library. Here's an example from a paper that benchmarks C, Julia, and Python with pygraphblas (this example hasn't yet been ported to the newer python-graphblas library):

https://www.researchgate.net/publication/356707900_The_Graph...

(disclaimer: I'm one of the paper authors).

The GraphBLAS handles sparse matrices very efficiently, so extremely large graphs can be used, we were able to benchmark against graphs with 65M nodes and billions of triangles in just over a minute. Both the Python and Julia bindings came reasonably close to the "native" C implementation in LAGraph.


One important component is adding significant damping, corresponding to a small probability of jumping to a random page. This makes sure that the graph is well connected and the power iteration converges fast.


Thank you -- I had always wondered what the optimization was behind that Algo!


Isn’t this known as the power method?


The calculation is the power method, but the main gist is modelling internet browsing behaviour as a Markov chain, wandering through randomly clicking links. The equilibrium probabilities of being at any page is the vector "r". I'm a bit surprised how few sources offer this explanation upfront, I guess they want to maintain the mystique, and want to make the technique appear more profound that it was.


Yup, I am very familiar with PageRank and the power method. Here’s my explanation from a decade ago:

https://web.archive.org/web/20130728183938/williamcotton.com...



Wish I had any clue what this meant, lmao. Where would someone who knows nothing about math start?


I'm taking the linear algebra course on code academy.

https://www.codecademy.com/courses/learn-linear-algebra/

There are a bunch of options for linear algebra courses out there. It depends on where you are coming from. A lot of this is still Greek to me as well, but the more I study the more a few terms start to click. Good luck.


I found 3b1b's series on the "Essence of Linear Algebra" [0] super valuable

[0] https://www.youtube.com/watch?v=fNk_zzaMoSs&list=PLZHQObOWTQ...


Usually, this is taught in first year linear algebra courses for any kind of engineering (or natural sciences) degree


PDF available from the Institute's website here: https://www.rose-hulman.edu/~bryan/googleFinalVersionFixed.p...

Interesting that Google has added two orders of magnitude to their market cap since going public in 2004.


One of the authors, Tanya Leise of UMass Amherst, passed away a few weeks ago. https://amherststudent.com/article/students-mourn-the-loss-o...



Relatedly: The Second Eigenvector of the Google Matrix

https://nlp.stanford.edu/pubs/secondeigenvalue.pdf


Early on at google, the matrix wasn’t very big and people would just manually tweak values in it


Related. Others?

The $25B eigenvector (2006) [pdf] - https://news.ycombinator.com/item?id=16091646 - Jan 2018 (68 comments)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: