Hacker News new | past | comments | ask | show | jobs | submit login
Friendship Paradox (wikipedia.org)
288 points by Xcelerate on Aug 4, 2014 | hide | past | favorite | 69 comments



This is established by making a one-step random walk in the friendship graph.

Things get more interesting when you start taking longer walks. If you take a long enough walk in a (strongly connected) graph, the probability of ending up in a particular place becomes independent of your starting point.

In fact, social graphs are said to be "fast-mixing", which means that "long enough" is typically only O(log n). In contrast, if you were walking in a two dimensional lattice, "long enough" would be O(sqrt n). This is the idea behind the "6 degrees of separation" factoid.

So what is the limiting probability distribution? That's actually the pagerank.

Which means that if friends randomly pass on a token, it's likely to end up in the hands of someone very influential with a lot of friends. It's been used for vaccine delivery.


Social graphs are not fast mixing. This used to be widely assumed but recent empirical measurements have refuted the assumption. [1] If I recall correctly, random walks tend to get stuck in cities and other highly-dense subgraphs.

Six degrees of separation refers to shortest paths, and fast mixing is decidedly _not_ the intuition behind it.

[1] http://syssec.kaist.ac.kr/~yongdaek/doc/imc2010.pdf


So what do you make of the papers such as SybilLimit[1] or SybilInfer[2] which rely on a fast-mixing assumption? They do seem to claim good empirical results on actual social graphs...

[1] https://www.comp.nus.edu.sg/~yuhf/yuh-sybillimit.pdf

[2] http://www.internetsociety.org/sites/default/files/dane.pdf

-----

EDIT: Wow, I've read that paper you linked to. So they use the data-set from SNAP like Facebook A and Facebook B.

But, if I understand correctly, these data-sets were formed by combining "egonets", which means they took a set of people and a small ball around them, and put all those balls in the graph. That explains the horrendous eigenvalues.


No. None of the graphs they use are ego-nets. That would be methodologically ridiculous, as you point out.

In fact, the main problem with the paper is the opposite -- the Facebook graphs they use are regional networks borrowed from [1]. This means that Facebook's actual mixing time should be _dramatically higher_ than the times measured in the paper because of the tendency of random walks to get stuck in regional networks. I believe this is the reason their Facebook-A and Facebook-B mixing times are much lower than the others, such as LiveJournal.

Alvisi et al. have a couple of related papers specifically looking at the implications of our new understanding of social-graph random walks for sybil defenses. [2, 3]

[1] https://www.cs.ucsb.edu/~ravenben/publications/pdf/interacti...

[2] http://www.cs.utexas.edu/users/lorenzo/papers/Alvisi13SoK.pd...

[3] http://www.cs.utexas.edu/users/lorenzo/papers/Alvisi14Commun...


Interesting, I've played with the SNAP datasets and did find very high eigenvalues, but I attributed it to the graphs bring a partial snapshot.

You're right, 6 degrees of separation is a statement about diameter, but a small diameter and mixing speed are related.


> If you take a long enough walk in a (strongly connected) graph, the probability of ending up in a particular place becomes independent of your starting point.

And this was the foundation of Google's PageRank. http://en.wikipedia.org/wiki/Google_matrix


Indeed. However, it's easy to miss the "random-walk" aspect of pagerank if one simply think of it as a recursive relation between inbound and outbound links.

Once you think "random-walk" you can build a whole class of "intentional surfer" model which model surfers seeking particular information.


I wonder if this can be exploited for business networking purposes. Simply choose a person in your "network" and ask them to introduce you to one other person they know and repeat with the new person.


"for business networking purposes"

Why not online matchmaking / dating? Not being in that market I have no idea if this already exists, if not it sounds like a great idea for a startup. Or if it already crashed and burned that would be interesting to hear about.

So ... guys ... I have this unattached sister in law, kinda athletic, professionally employed, owns her own house, she is about as much of a PITA as her sister, aka my wife, so if I can put up with my wife you should be able to tolerate her little sister ...

From historical experience my attempts setting up blind dates between my SiL and guys I know, have all gotten shot down by my wife, aka her sister, with what summarizes to "WTF were you thinking". Assuming a similarity between matchmaking and business relationships, this might indicate issues with peer to peer connection building. At least a meta-moderation facility is likely needed.

Also if A sets up a hot date or business lunch between mutual friends B and C I could conceivably lose both B and C as friends if they hate each other. So in a game theory perspective the guy who does the least matchmaking comes out furthest ahead, right? But the purpose of the site was to do matchmaking, at least in theory and in PR. Although the real purpose would be to sell subscriptions and ads. I think I'm seeing another problem here.


I wonder whether people would tend to introduce you more influential friends, over less influential friends, as a kind of status display. You could potentially "climb" faster than a random walk would.


All of the guided "ask X to introduce you to Y" systems that I have seen are... poorly implemented. The thing is, asking for an intro is complex; it depends on your relationship with the person doing the introduction, and with their relationship with the end target, and with how beneficial the potential relationship between the introduced parties is perceived to be.

You have a limited number of "waste your more important contact's time" tokens. You get more when you successfully introduce someone who is useful to your higher-status contact,

The odd thing is, even in an unsuccessful match, where you introduce two people and they don't establish a relationship they both find pleasant, if you connect a lower-status contact with the higher-status contact, even if the match is unsuccessful, you are still seen to have done your lower-status contact a favor, even though you have annoyed the higher status contact, so if the lower-status contact is someone you want to curry favor with, it might make sense to introduce them even if there is a lower probability of them forming a mutually profitable relationship with the higher-status contact, but normally it's a matter of "how interesting is the potential relationship between the two people I am introducing," which takes some thought and often times means not making the introduction.

The upshot here is that I'm not going to just right out introduce two of my contacts. I mean, I will if I think they can have a beneficial relationship, but I generally feel pretty weird when someone directly asks me for an intro to another specific person, rather than 'Hey, do you know anyone who is good with X" - because with the latter, I can talk to each person individually, and I can silently drop the whole thing without telling the other they were rejected, whereas if X asks me to introduce them to Y, and either Y says they are not interested or I think I would annoy Y, then I've gotta reject X, which can be awkward.



That seems too planned, it assumes you have a specific person in mind. What I mean is asking for random introductions until you meet someone influential.


i dont see how that would introduce you to someone influential. yes, someone influential is 6 degrees away from someone you know, but there are like 5 billion people in those 6 degrees.

no?

with what is said above, it would seem to suggest that after enough people, the person you are introduced to is irrelevant to the first person who started introducing you to people.


Isn't that what LinkedIn does?


Who downvoted this? This is exactly what LinkedIn does. You ask to be connected to a third party through one of your pre-existing connections. If you get connected to someone with a huge network, you can then connect with even more people without jumping through LinkedIn's hoops, or paying for their service.


I may be oversimplifying, but thinking of it at this extreme helped me intuit a bit better.

Imagine 1,000,001 people. One of them is friends with the rest, all of whom only have this one friend. The average number of friends a person has is 2, but the aggregate average for their friends is 1,000,000.

So most social graphs probably look like a spoke-hub where a minority of people have a very large number of friends.


An excellent way of thinking about it.


This might explain why introverts tend to think that everyone else is more extrovert than them, there's probably a lot more introverts than we realise but we're just less likely to meet them.


I'm not sure, but my intuition tells me there's half introverts, half extraverts. (introversion/extraversion is a spectrum).


Its not just your intuiton the bell curve tops outs squarely in the middle of the two not with two nice peaks. I found this out looking in to Myers briggs(I think its very qustionable). People can never work out whether I am introvert or extrovert.


Yeah, Myers-Briggs seems only worthwhile if you’re an outlier on a particular trait. It’s almost misleading to report which side of the mean you’re on if you’re only a few percent to one side.

For example, I’m near the average on thinking vs. feeling and judging vs. perceiving, so I don’t consider those traits terribly important or informative. However, I am moderately more introverted than extraverted and significantly more intuitive than sensory. Then again, I don’t need a personality quiz to know that!

Any time you try to reduce people to a few traits—Myers-Briggs, the Enneagram, whatever—you lose so much information that the exercise becomes nearly futile.


If someone asks, I say that I'm ITP. I deliberately forgot whether the last test I took showed me as slightly S or slightly N, and introspection doesn't reveal a preference between them.


- The notation for that is IxTP. - Introversion doesn't reveal the preference between the two values of the second letter because it measures the first letter. :)


- Thanks, that makes sense.

- Introspection, not introversion.


But that's the point... It's a continuous distribution meaning you probably fluctuate between introvert and extravert very smoothly with most of the population within the standard deviation. There will probably always be people that stay at the tail ends, but it's not average.


The shape of the distribution affects the value of the introvert/extrovert distinction pretty strongly, though. If it's strongly bimodal (so a random extrovert is very different from a random introvert), then it says a lot, while if it's unimodal with a small variance (so a random extrovert is only a little more extroverted than a random introvert) it doesn't say much.


And you are thus less likely to be friends with an introvert than an extrovert.


That's actually quite depressing. "You are always surrounded by people who are more popular than you."


I don't think it's as bad as that. It's the mean number of friends, not the median.

The example of twitter makes this a bit more clear: your followers have more followers, on average, than you. But that's because you likely follow someone with a million followers. That one person greatly affects the mean follower count.

So if you follow 10 people, one of whom has 1000 followers, and the rest just have one, the paradox still holds.


This comment seems to have downvotes - why? This made the linked paradox click for me, and seems like a good explanation. Is it incorrect?


and now your comment is in the grey with no explanation. It's absurd how downvote-happy HN's been lately, really kills the conversation...


Off topic: I would be curious to get statistics about the evolution of the ratio of downvotes/upvotes the past few years on HN. I also have the sentiment that HNers have been very "downvote-happy" lately, as you put it. I really think downvotes should be rationed in some way (and not just enabled after some karma threshold is reached).


My new habit: Automatically up-vote every greyed-out comment I encounter. I don't go searching for them, but when I see one, I upvote it without even reading it.

I used to read them carefully first, to decide whether I should up-vote. But from that experience, I've learned I almost always should -- the downvotes were rarely warranted, and almost always lazy and brainless. So I may as well save time and cut to the chase -- just upvote.

(If enough of us do this, maybe we can minimize the problem HN is seemingly unwilling and/or unable to address.)


I've been doing that as well. Sometimes a downvoted comment is truly deserving of its status, but often incorrect statements get downvotes even though they spawn interesting discussions. I try to offset that, because even if they're wrong, they're still an important part of the conversation.


Spin it the other way and you get "people who are popular want to be your friend."


Actually, I prefer it that way. Let other people deal with the hassle of dealing with people. My greatest fear is discovering too late that something I'm doing will make me famous.


> My greatest fear is discovering too late that something I'm doing will make me famous

Careful. That quip is almost good enough to become famous for.


See related remarks by Seinfeld: http://www.youtube.com/watch?v=ydhu7sGVNhQ


Also depressing are some of the corollaries that have cropped up. "On average, you are surrounded by people who are more successful than you." "On average, your (sexual) partners have more partners than you."


>"On average, you are surrounded by people who are more successful than you."

How does this one work? It doesn't seem to fit. I'm more likely to be friends with friendly people, more likely to have sex with people who like to have sex, why am I more likely to be around successful people?


This was actually how I first heard of the friendship paradox. There were a number of articles related to this earlier this year; here [0] is one and the original article [1]. Sorry for not linking earlier.

[0] http://www.technologyreview.com/view/523566/how-the-friendsh...

[1] http://arxiv.org/abs/1401.1458


Successful people have more friends than average.


That may well be true, but it isn't really relevant to the paradox. That's a second-order effect which needs to be established statistically, not the paradox itself which is pure graph theory. You're basically saying "You're more likely to be friends with people who make a lot of friends - that is an emergent property of a network. Also, I believe friendliness is correlated with success."


98% of the time to be more specific


On Twitter. I think we can safely assume Twitter is an extreme example.


An even simpler explanation:

Start with the baseline assumption that people are broadly similar to the people they're friends with -- i.e. socialites are friends with socialites, loners are friends with loners.

Overlay that with the observation that you're more likely to be friends with someone who has lots of friends, and less likely to be friends with someone who has few friends.

Consider the extremes: You're certain to be friends with someone who is friends with everyone, and certain to not be friends with someone who has no friends.


I don't think it is a paradox when you look at it from a "better" perspective. Just plot a graph displaying the frequency people have x friends. Chances are, there are some outliers with enormous amounts of friends who thus push the average.

I think it is not as much a paradox but rather a good demonstration of how the average can be an inappropriate measure to describe some samples.


Another simple explanation:

Imagine that there was one single person who was friends with all 7 billion people on Earth. Assuming nobody else had more than about 250,000 friends, this would cause every other person on the planet to have fewer friends than their friends have, on average. That one person skews the mean.

What happens in reality is of course much, much more subtle.


Oh, so now I know why I have no friends.. Someone must be at the very bottom of this pyramid. :)


I can be your friend.

The popular one.

xD


Good article by Strogatz, with clear explanation: http://opinionator.blogs.nytimes.com/2012/09/17/friends-you-...


yay for strogatz! he was one of my favorite math profs in school because he really knew how to explain counter-intuitive phenomena like this.


This is a fun property of scale-free networks in general (https://en.wikipedia.org/wiki/Scale-free_network).


Head Squeeze did a bit on this a while ago, as applied to Twitter: https://www.youtube.com/watch?v=Z_15zbgNpHk


Can you really have more than 1 or 2 friends? You only have two kidneys and a friend is someone you'd be willing to give a kidney to (compatibility aside).


Yes. You can oversubscribe your friends to your kidneys. You just have to be willing to give them, but that does not need to actually happen. You'll be fine as long as you pick your friends wisely.


Think of it as a farm of virtual machines.

Lets say the total amount of RAM in your cluster is X. Depending on the average usage in your farm, you might still be able to assign 2X to your virtual machines because the average usage for a machine is 50% or lower.


When I was working on http://blog.stephenwolfram.com/2013/04/data-science-of-the-f..., a lot of bizarre results (such as a huge gender difference in number-of-friends for people older than 40) went away after I took into account the friendship paradox. It would be really interesting to revisit that anomaly and try to explain why.


I don't have any friends, am I excluded?


No, you're just a divide-by-zero error.


[flagged]


"On-Topic: Anything that good hackers would find interesting. That includes more than hacking and startups. If you had to reduce it to a sentence, the answer might be: anything that gratifies one's intellectual curiosity."[1]

[1] https://news.ycombinator.com/newsguidelines.html


Read the article and you'll find that this is an interesting observation that has current relevance (the Ebola outbreak).


[flagged]


If you drop the needless pejorative language, you get the observation from the second paragraph of the article...


Should I have used the phrase super-fecund instead of the word 'slut'? Actually, with birth control we fool our bodies into thinking that we're mating so fecundity doesn't apply I guess.


'Promiscuity' is a term that is less loaded. Anyway, your observation is in the lede in the article, so it's likely the combination of language and lack of new information that sunk your earlier comment.


I've never heard this before. It kinda hurts my brain a little to think about.


If you think about it like a computer network, most nodes have fewer neighbours than their neighbour nodes. Most nodes are endpoints with only a few neighbours, but some nodes are networking midpoints with a lot of neighbours, and every endpoint knows at least one midpoint.

On a network segment with one gateway and ten endpoints, all nodes can see ten other nodes... except the gateway, which can see more nodes outside the segment. The average is therefore higher than ten, so most nodes see less than the average.

Hrm, I'm not sure if that clears it up at all. Maybe :)


That makes good sense. Yeah, thanks for clearing that up




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: