> The HITS algorithm has one drawback that it is super easy to game it. PageRank is not entirely resistant, but its a little more robust.
> Create a harvester page that points to lots and lots of popular, high traffic pages on the internet. By virtue of doing this it can accumulate a lot of Hubs score which it can redirect as an Authority score to an intended page.
I'm not sure how HITS is any more "easy to game" than PageRank? As far as I understand it, the differences are almost entirely limited to performance characteristics, not semantics. The example you give doesn't seem to be specific to HITS (as opposed to PageRank) in any way.
(I'm also not sure how "game theory" is relevant here, unless by "game theory" you just mean "the idea that people will try to game it".)
One could pose this as an adversarial game. For the simplistic case consider two participants -- (i) the ranker that chooses a ranking scheme (we need to constrain the space of ranking schemes somehow for this to lead to any useful formulation), (ii) web page who tries to outrank other pages by strategically linking to other pages, and possibly buying links to itself from other pages. One can give (ii) a budget to add and delete links and pages that it can control. In this framework one then try to compute what's an equilibrium strategy. The multiplayer version is a lot more complicated.
If you check my original comment, I gave a simple scheme to attack HITS rank. The main drawback is that one can 'harvest' Authority score using 'out-links'. Outlinks are cheap and easy, compared to 'inlinks'. Sybil attack is a little harder for Pagerank.
> If you check my original comment I gave a simple scheme to attack HITS rank. Sybil attack is a little harder for Pagerank.
OK, but how is it harder for PageRank? I can't really see any differences in the semantics of the two algorithms, so I'm not sure what kind of added vulnerability one or the other could have.
> One could pose this as an adversarial game.
Yeah, I appreciate that, that's what I was referring to as "the idea that people will try to game it". It's not really the kind of 'game' that would be considered in game theory, though, because it doesn't have any interesting or emergent properties - the designer's response will just be "oh yeah we should stop people gaming our algorithm".
If you are familiar with the algorithms, which I assume you are, you can work it out.
To make my page score high on the PageRank score I need to acquire links from high PageRank score pages. This is a lot harder because it depends on a) in-links and b) high PageRank pages. With Hits, its easy for one page to harvest a high Hub score. All that is needed is to outlink to known good pages (authority). Providing outlinks is trivial. Once so harvested, one can direct that flow to a designated page to give it a high Authority score.
> It's not really the kind of 'game' that would be considered in game theory
Why not ? Formalize the strategy spaces of both the players and its a very valid game in the Game Theory sense. For the ranker you have to consider some functional space of functions over a graph. For the page player it has a budget of alterations it can make to the graph.
> With Hits, its easy for one page to harvest a high Hub score. All that is needed is to outlink to known good pages (authority). Providing outlinks is trivial. Once so harvested, one can direct that flow to a designated page to give it a high Authority score.
Are you saying that you think HITS doesn't recursively score the quality of references by their own scores? That's not true. It does exactly what PageRank does in that respect: a page's score depends on the score of those which reference it, which in turn depends on... etc.
The 'hub' vs 'authority' distinction is interesting but not really relevant here: we're considering a page's 'authority' score, which depends on the 'hub' score of those who outlink to it, and at that point we're just doing PageRank [again, except performance-wise and arguably freshness-wise].
Like I said: the only non-trivial differences between them are implementation / performance-related, not semantic.
> Why not ? Formalize the strategy spaces of both the players and its a very valid game in the Game Theory sense. For the ranker you have to consider some functional space of functions over a graph.
Yes, again: possible to frame it as a formally valid problem if you really want to; still not an interesting one. We're only talking about this because you want to maintain that your earlier statement was true.
"You have to consider some functional space of functions over a graph" gives no detail (besides that, yes, you can model something–maybe documents, maybe people, who knows?–as a graph) and sounds like something written by a person with a gun to their head.
Or maybe I'm wrong and there's a fascinating problem which you just don't want to divulge to me.
> Are you saying that you think HITS doesn't recursively score ...
I doubt that reading comprehension is that hard a skill to master. I dont see where I have said anything about recursion or their lack of. If you want to have an imaginary conversation between yourself and what you think I have said, you can continue. I do not need to participate in that. I am sure you alone will suffice.
I think going back to the Pagerank and HITS papers carefully and understanding them will be illuminating. You keep saying they have no semantic difference, which cant be further from the truth. The scores are the eigenvectors of very different matrices, and HITS scores are straight forward to manipulate. BTW the papers cite each other stating in what way the other is different, if their difference was mere implementation detail and no semantic difference, I doubt they would stand as published papers.
The Hub score is not something interesting that I happened to say but a core part of the HITS paper. It is by virtue of the Hubs score that the Authority scores are defined (and vice versa) in the paper. It so happens, its easy to bump up the Hub score of a page by adding few strategic outlinks.
> Yes, again: possible to frame it as a formally valid problem if you really want to; still not an interesting one. We're only talking about this because you want to maintain that your earlier statement was true.
That's your opinion. I am just countering your categorical claim that there is no game theory formulation possible here. If your yardstick for your assertion that no game theoretic formulation is possible is your inability to find one that's interesting to you, there is not much I can do about it. All I can say is that such an yardstick is not very popular or useful.
A game theoretic formulation with a budget constraint adversary is a very natural setting. Research on link spam resistant node ranking algorithms are a thing, as is evaluating how stable are the rankings produced by some of these proposed algorithms to (potentially motivated and adversarial) changes to the links in the graph. SIGIR, WEBKDD proceedings on link analysis and rankings would be a good place to look.
You will need some background in matrix perturbation analysis, especially perturbation analysis of principal eigenvector to understand some of the results. Perturbation analysis of finite Markov chains will also suffice.
is by far one of the easiest papers to read in this area (Andrew Ng has focused on other areas of research since this paper). Note that the stability bounds in that paper can be easily tightened... left as an exercise for you. As you will see in the paper, HITS scores are easier to alter (equivalently stated, they are unstable) compared to Pagerank scores.
> Or maybe I'm wrong and there's a fascinating problem which you just don't want to divulge to me.
This is hardly the forum for extended discussions on a research topic. With the pointers and sketches that I mentioned a competent grad student would be able to fill in/ develop it further.
> Create a harvester page that points to lots and lots of popular, high traffic pages on the internet. By virtue of doing this it can accumulate a lot of Hubs score which it can redirect as an Authority score to an intended page.
I'm not sure how HITS is any more "easy to game" than PageRank? As far as I understand it, the differences are almost entirely limited to performance characteristics, not semantics. The example you give doesn't seem to be specific to HITS (as opposed to PageRank) in any way.
(I'm also not sure how "game theory" is relevant here, unless by "game theory" you just mean "the idea that people will try to game it".)