The $25B Eigenvector: The Linear Algebra behind Google (2006)

skzv · 2024-03-03T23:15:36 1709507736

I just learned about this in the Mining Massive Datasets course at Stanford.

The gist is that you can formulate the PageRank problem as a flow equation r = Mr, where r is a vector of PageRanks and M is a stochastic adjacency matrix, which can be interpreted as the probability of visiting one page from another.

This matrix is huge and hence this is tough to solve.

But since M is a stochastic matrix, it has the property that its largest eigenvalue is exactly 1, and all other eigenvalues are < 1. And the equation r = Mr can be interpreted as r being the eigenvector of M with eigenvalue 1, as this equation implies multiplying by M is equivalent to multiplying by 1.

Repeatedly multiplying by M hence shrinks every eigenvector except the eigenvector with eigenvalue 1. So you start with some initialization of r, and repeatedly multiply by M until it converges. The resulting value of r solves r = Mr since all other eigenvectors shrank towards zero.

This is a much more efficient way to compute r, and hence makes PageRank computationally feasible.

michelpp · 2024-03-03T23:42:59 1709509379

PageRank's linear algebraic formulation also has a natural interpretation in the GraphBLAS sparse graph linear algebra library. Here's an example from a paper that benchmarks C, Julia, and Python with pygraphblas (this example hasn't yet been ported to the newer python-graphblas library):

https://www.researchgate.net/publication/356707900_The_Graph...

(disclaimer: I'm one of the paper authors).

The GraphBLAS handles sparse matrices very efficiently, so extremely large graphs can be used, we were able to benchmark against graphs with 65M nodes and billions of triangles in just over a minute. Both the Python and Julia bindings came reasonably close to the "native" C implementation in LAGraph.

eutectic · 2024-03-04T23:23:20 1709594600

One important component is adding significant damping, corresponding to a small probability of jumping to a random page. This makes sure that the graph is well connected and the power iteration converges fast.

enonimal · 2024-03-04T02:31:56 1709519516

Thank you -- I had always wondered what the optimization was behind that Algo!

williamcotton · 2024-03-04T01:28:53 1709515733

Isn’t this known as the power method?

dchftcs · 2024-03-04T01:53:41 1709517221

The calculation is the power method, but the main gist is modelling internet browsing behaviour as a Markov chain, wandering through randomly clicking links. The equilibrium probabilities of being at any page is the vector "r". I'm a bit surprised how few sources offer this explanation upfront, I guess they want to maintain the mystique, and want to make the technique appear more profound that it was.

williamcotton · 2024-03-04T03:10:35 1709521835

Yup, I am very familiar with PageRank and the power method. Here’s my explanation from a decade ago:

https://web.archive.org/web/20130728183938/williamcotton.com...

skzv · 2024-03-04T02:44:12 1709520252

Yes [0]

[0] https://en.wikipedia.org/wiki/Power_iteration

Uptrenda · 2024-03-04T00:00:29 1709510429

Wish I had any clue what this meant, lmao. Where would someone who knows nothing about math start?

xrd · 2024-03-04T00:03:25 1709510605

I'm taking the linear algebra course on code academy.

https://www.codecademy.com/courses/learn-linear-algebra/

There are a bunch of options for linear algebra courses out there. It depends on where you are coming from. A lot of this is still Greek to me as well, but the more I study the more a few terms start to click. Good luck.

skzv · 2024-03-04T00:32:28 1709512348

I found 3b1b's series on the "Essence of Linear Algebra" [0] super valuable

[0] https://www.youtube.com/watch?v=fNk_zzaMoSs&list=PLZHQObOWTQ...

greyw · 2024-03-04T00:07:14 1709510834

Usually, this is taught in first year linear algebra courses for any kind of engineering (or natural sciences) degree

jihadjihad · 2024-03-03T21:09:40 1709500180

PDF available from the Institute's website here: https://www.rose-hulman.edu/~bryan/googleFinalVersionFixed.p...

Interesting that Google has added two orders of magnitude to their market cap since going public in 2004.

shas3 · 2024-03-04T00:05:23 1709510723

One of the authors, Tanya Leise of UMass Amherst, passed away a few weeks ago. https://amherststudent.com/article/students-mourn-the-loss-o...

toomuchtodo · 2024-03-04T02:13:43 1709518423

https://www.amherst.edu/news/memoriam/node/872333

https://en.wikipedia.org/wiki/Tanya_Leise

https://tleise.people.amherst.edu/

hardmath123 · 2024-03-04T00:37:46 1709512666

Relatedly: The Second Eigenvector of the Google Matrix

https://nlp.stanford.edu/pubs/secondeigenvalue.pdf

google234123 · 2024-03-04T11:37:55 1709552275

Early on at google, the matrix wasn’t very big and people would just manually tweak values in it

dang · 2024-03-03T21:12:35 1709500355

Related. Others?

The $25B eigenvector (2006) [pdf] - https://news.ycombinator.com/item?id=16091646 - Jan 2018 (68 comments)