One nice thing is that this algorithm is very amendable to being augmented with additional data. For example if your initial adjacency matrix gave equal weight to each outgoing link, nothing is stopping you from measuring the actual "transition probabilities" by pervasive tracking / custom DNS servers such as 8.8.8.8 and so on. Moreover it is also easy to generate a personalised model for an individual user by recording their activity over time and using that to predict transition probabilities for websites they might never have visited. In that way you can generate a filter bubble.
You make it sound so ominous, and the result would in fact be a large positive, rather than a negative.
I mean sure, this would require some metrics as to which links are actually followed versus which ones won't, but there's no need for "pervasive tracking". Google could just get these from it's own employees, or indeed try to extract some of these metrics from 8.8.8.8 (ie. from volunteers)
Given how big Google is, I'm sure the answer is "all of the above", but the end result is: Google places the page where you eventually end up finding what you want high on the search results page. And it does this by getting tiny amounts of help for you from millions of others, anonymously.
So we have no reason to assume any of the methods are nefarious, and the end result of it is definitely useful. More of this, please !
> Google places the page where you eventually end up finding what you want high on the search results page.
I'm going to have to go ahead and disagree with you here. The smarter Google tries to make their algorithm the more infuriatingly terrible it is for me to use.
All I want is a feature to completely forget my YouTube and search history when I am not logged in. I never want to see personal recommendations, because for me personally they continue to get worse and worse.
Maybe the issue is that SEO keeps getting better. I bet if you were to get what you really wanted (a completely unpersonalized experience), the results would be horrible, because traditional approaches to ranking are totally useless these days.
I have a good experience on new devices. I used to have a pretty good experience deleting cookies every time I close a browser tab, but now even that's degrading.
I should probably just buy a new device ever few months like Steve Jobs did with cars.
Im with you on that. These algorithms make discovery of new, different content way harder than in the past. YouTube almost exclusively suggests stuff Ive already watched or really-similar content. Yet, I found most of it via sources that dont do that.
I get why that's the default for the large majority of people, but it's almost never what I personally want.
I'm willing to accept that my preferences are different from those of the majority, but I'm not sure I want to live in a world filled with smart devices that are always getting in my way.
Page Rank is an incredibly elegant algorithm.
But in practice I suppose that the actual Google search algorithm has little resemblance with the mathematically elegant PR algorithm, because, it has to implement many custom tweaking, tuning, DMCA black-lists, right to be forgotten list, etc ...
It's like "machine learning" a spam filter, isn't it? It sounds so simple and elegant, and then somehow you end up with 200 rules, most of which are entirely manual, and weekly tweaks to keep it working well.
Bibliographic citation algorithms in general are pretty interesting. Jon Kleinberg's work from this era and earlier is especially wonderful to read if you're interested.
Yeah, that's the paper cited in the original PageRank publication.
But a lot of his other stuff is pretty interesting. He has done a lot of work that you can broadly classify as applications of graph theory to the social world.
That includes the HITS paper, but also work on small world networks, social networks, and several other papers on links and information flow related to the web.
I find it weird that this was published in 2007 but the first sentence is "Some months ago newspapers all around the world considered Google’s plan to go public", and that happened in 2004.
It could have been an internal document that was published a few years after written. This could be due to a indecision on whether or not to publish the work. This document could also be originally for internal use to educate new hires.