Hacker News new | past | comments | ask | show | jobs | submit login

PageRank is using the stationary distribution of a random walk; that’s very different than just incoming links (which AV did have)

In a way, it ranks a link by its own incoming links (which are ranked by their incoming links etc). It was possible to game AV by setting up 100 sites that points to your own.

In pagerank, you essentially had to convince already popular pages to link to yours.

For a while, google effectively had no spam, and AV had lots. Eventually, spammers learned to game pagerank; the arms race is still on.




The person you're replying to is clearly aware of search ranking algorithms. You should try looking into the HITS algorithm they mentioned for some additional context.


To be fair, I'm an ops guy, not a search engineer. ;) It's valid to say AV might not have implemented the basic concepts as well and I don't want to devalue Google's innovation. It just annoys me when people assume PageRank was a unicorn and nobody else was doing anything similar.


As much as I value PageRank, it annoys me when people assume it was a novel idea.

It had been applied decades before through scientific paper references, as a measure to improve on the "number of references" metric, which is more easily gamed. References are more rarely circular (only same time in-preparations can form cycles, unlike web pages). I was sitting in a class about stationary processes in 1996 when the lecturer mentioned this (already old and well known at the time) use case as motivation.

Whatever AV implemented at the time, it was not on par.


Well, HITS is applied after you’ve already selected a subset, at response time; so, if you didn’t select s good subset (and AV often didn’t) then picking the most promising out of that subset is not as helpful.

Pagerank is essentially a “universal authority score” (in the HITS terminology), and it worked well because at the tine you didn’t have pages that were authority for one subject and spam for another. You do now - which is why pagerank is now one signal out of 200, even though it was sufficient on its own 20 years ago.


I do seem to recall AV getting flooded with spam --- porn spam to be exact. I remember myself and all the nerd kids in my high school computer lab would joke that you could search for something completely innocuous like "yarn" and always get back at least a couple of porn links.

This of course was also at a time when "whitehouse.com" was a porn site.


It might have been funny if you were in school, but if you were at work that’s a different story.


Page rank is just one way to order pages. Another is relevance. How you combine the two ordering schemes is another big question.


Good point - pagerank is a query-independent signal which is harder to game, but the (very large) part of ranking that is query dependent was still very much gameable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: