Berkeley, sadly, is perhaps too large and diverse for an overall characterization.
So true. As a Cal alum, I think he is spot on here. :-)
I pulled about 400 followers from each school, and added a couple filters, to try to ensure that followers were actual attendees of the schools rather than general people simply interested in them
How did he ensure followers were actual attendees of the schools programmatically? It would be really hard to find out this type of information. And can be considered as borderline creepiness in some cases.
EDIT: He also works as a data scientist at Twitter. So I'm sure he has access to lot more internal data rather some sort of mashup between Twitter, FB, LinkedIn APIs.
How did he ensure followers were actual attendees of the schools programmatically?
He mentions in the comments "I basically just checked that they didn't follow any other schools (from a small list). It's certainly not the greatest filter, but it did seem to work for a small number of people I hand-checked."
The article does not point out that sampling quora users might not be (I would say is probably not) an unbiased estimator of the the students of these places as a whole. Quora attracts a certain kind of person, I've never seen it mentioned anywhere outside of techy startup / valley circles. Maybe that's implied by virtue of it being in the HN ecosystem, but still it should be explicitly stated as a flaw in the method. Lies, damned lies and statistics etc.
The article does not point out that sampling quora users might not be (I would say is probably not) an unbiased estimator of the the students of these places as a whole.
From the article:
Also, a word of warning: my dataset was fairly small and users on Quora are almost certainly not representative of their schools as a whole (though I tried to be rigorous with what I had).
I completely missed that, mea culpa, my apologies to the author. I'll leave the message above unedited as a little warning sign to other people who can't read: you too can look silly like me.
MIT and Stanford like Hip-Hop Music. I'm pretty sure MIT prefers Biggie and Stanford prefers Tupac (note: this has nothing to do with east/west coast, but rather pure skill vs. charisma)
Interesting, but wrong use of conditional probabilities. All OP is saying is that the frequency of people from school x following topic y is p.
The dataset is just not the right one to say that P(x|y) = p, because of, say, all the people who follow food in NYC and go to NYU which were not taken into account here.
Surprisingly diverse interests, considering the huge bias in the dataset (public quora profiles). Well worth reading, though I was left wondering about the inverse probabilities.
"Berkeley, sadly, is perhaps too large and diverse for an overall characterization."
That isn't sad at all. It's great. As a UC grad, the diversity of the student body was one of the absolute best things about my college experience. Do I wish I went to Harvard or Stanford? Sure, but not for their homogeneous student bodies.
So true. As a Cal alum, I think he is spot on here. :-)
I pulled about 400 followers from each school, and added a couple filters, to try to ensure that followers were actual attendees of the schools rather than general people simply interested in them
How did he ensure followers were actual attendees of the schools programmatically? It would be really hard to find out this type of information. And can be considered as borderline creepiness in some cases.
EDIT: He also works as a data scientist at Twitter. So I'm sure he has access to lot more internal data rather some sort of mashup between Twitter, FB, LinkedIn APIs.