> Would it be ok if they warned that it's just for entertainment?
I mean they were always ok, but the problem is presenting some silly fun data analysis as "science".
If you counted how many people have red hats in your way to work and publish it as a blog it can be fun. If suddenly far right groups start making posts about how your red hat stats correlate with immigration and theft statistics then your post is not the problem, but there is a problem.
same with the OkCupid blog. I have 0 issue with their work (albeit they could show their work cause a lot looked incredibly amateurish), but the talking points that came from it are still being repeatedly wrongly 10 years later and did real damage.
"presenting some silly fun data analysis as "science""
Their silly data analysis had a sample that was probably orders of magnitude larger than any cohort an university paper could dream of getting their hands on, but yeah, it was just a silly fun data analysis.
> Their silly data analysis had a sample that was probably orders of magnitude larger than any cohort an university paper could dream of getting their hands on
and yet their sample was incredibly biased despite it being large. A big set of data still churns out bad results.
Most of their blog posts did not show the data, how they normalised it, percentage of each group. They just run regression on it and anything with p<00.5 that seemed fun was published. Which was fun, but its not good science, its hardly science at all.
There is a reason why blog posts and published results have different standards (and also why most sociological/psychological results are impossible to reproduce having one of the largest reproduction crisis of any science)
> They just run regression on it and anything with p<00.5 that seemed fun was published.
How do you think science operates in actual practice?
OkCupid at least has a sample large enough that you can commit all sorts of statistical sins and still produce halfway decent conclusions, unlike your run-off-the-mill n=10 study.
So, after sampling the interactions of 1.51 million people, and analysing the amount of messages an average tall woman receives compared to an average short woman, the conclusion that "taller women, on average, are less attractive to the average male than short women" is invalid?
Please, tell me how this is absurd, when it's basically what you see in real life. Or that men that make more money are more attractive to women, on average? This delusion of denying facts when it does not conform to your own sense of what is "fair" is just childish.
because they're all people who signed up for okcupid, who were actively using okcupid to date at the time.
the amount of pressure this puts on the sampling is hard to understate. not only are you looking at a highly truncated set of interpersonal dynamics (what you can tell about someone based on a dating profile is far less than what you can tell by meeting real people in meat space), you're specifically selecting for people who have been as of yet unsuccessful IRL. you're also selecting for people who wish to take a fairly serious shortcut around personal interaction.
given these factors, it would be both crazy and stupid to believe you can generalize about human nature anything you get from a dating app.
as for "what you see in real life", might I suggest that you, yourself, are also constrained to a perspective that doesn't necessarily match other people's perspectives, and that maybe, just maybe, you are also seeing a biased sample of the world which is both the result and cause of difficulties with interpersonal interactions?
> Analysing the amount of messages an average tall woman receives compared to an average short woman, the conclusion that "taller women, on average, are less attractive to the average male than short women" is invalid?
It isn't invalid, but its also not science. If you have 1000 short girls and 3 girls over 6'4 well you got an issue of normalising values. Where variance of results are more affected near the edge cases than the median.
The problem is not the results, its that on a vaccum they are contextless, and that is what makes the silly fun data anlysis.
> Or that men that make more money are more attractive to women, on average?
Yeah because if they wanted to date someone broke they could find them outside. Online dating is a fraction of their total dating behaviour, therefore conclusions are not conclusive.
If you want actual studies about it, women prefer a stable job over wealth. Aka a dude with 100k normal salary has a better chance than an unemplyed dude with 1 million dollars. This is something that cannot be found through OkCupid cause networth isn't even a question or filter on their analysis.
> This delusion of denying facts
I am not denying facts, i have just done science past running a database through matlab.
> the conclusion that "taller women, on average, are less attractive to the average male than short women" is invalid
I'd say it's premature given that a lot of women don't want to date men shorter than them, so I can see men messaging them less just because they'd assume they were wasting their time.
> nd yet their sample was incredibly biased despite it being large.
I gotta ask, in case what way is the sample of a few millions of people who want to date 'incredibly' biased?
Slightly biased I might buy, but how on earth can you make the leap that people looking for dates via a matchmaking site is not at least slightly representable of people looking for dates?
The sample size is, honestly, mostly irrelevant for these 'headline' type analyses. A good sample of a few hundred will get you a strong estimate, while a biased sample of a few million will get you a bad one.
The classic illustration is the Literary Digest poll of the 1936 Presidential Election. They got over two million respondents and confidently predicted a heavy Roosevelt loss. George Gallup got a comparatively tiny but well sampled set of respondents and correctly estimated the Roosevelt victory. The magazine's problem was partly sampling bias (their readers were richer than the average American, amongst other things) and partly non-response bias (they got a return rate of ~25%, which is pretty good, but probably disproportionately from the most politically engaged of their readership, who hated Roosevelt).
I don't know how bad the sample is here for extrapolating to 'all those seeking dates', but the size is a distraction - the sampling is all.
(This isn't necessarily true if you're making individual profiles, searching for rare phenomena, or something else which is more in the 'big data' than 'social statistics' space, and which genuinely needs that much training data. It's also not true if you're studying the site members themselves, for whom the site members are a perfect (non-) sample. The latter is true for the in-house data science team themselves, but we shouldn't just assume their experience will extend to the rest of the world).
Online dating already mostly skews towards collage age. Secondly apps like OkCupid with long forms will self select the users who are not willing to disclose that information, don't wanna spend the time, or won't engage seriously with the form.
So you will have mostly a group of 20 year olds, who are willing to answer a lot of questions. It doesn't matter how big the pot is, thats not a random sample of society.
> So you will have mostly a group of 20 year olds, who are willing to answer a lot of questions. It doesn't matter how big the pot is, thats not a random sample of society.
Is that all? That doesn't make it "incredibly" biased.
I mean, looking at your example of "black women being the least desirable on OkCupid", it's easy enough to check if it applies only to 20 year olds who are willing to answer lots of questions or if that correlation actually exists outside of that group[1].
Your way, of simply dismissing it instead of regarding it as a data point that can be further investigated is the opposite of science.
[1] OkCupid certainly had the data, and on one of the blogs I recall seeing their distribution of ages, and it was not dominated by the 20-30 age group, even if they were a majority. It would have been easy for OkCupid to redo that desirability analysis for particular age groups.
I mean they were always ok, but the problem is presenting some silly fun data analysis as "science".
If you counted how many people have red hats in your way to work and publish it as a blog it can be fun. If suddenly far right groups start making posts about how your red hat stats correlate with immigration and theft statistics then your post is not the problem, but there is a problem.
same with the OkCupid blog. I have 0 issue with their work (albeit they could show their work cause a lot looked incredibly amateurish), but the talking points that came from it are still being repeatedly wrongly 10 years later and did real damage.