Yup. Assuming the sample sizes are statistically significant, the original paper clearly shows:
- On average, people estimate their ability around the 65th percentile (actual results) rather than the 50th (simulated random results) -- a significant difference
- That people's self-estimation increases with their actual ability, but only by a surprisingly small degree (actual results show a slight upwards trend, simulated random results are flat) -- another significant difference
The author's entire discussion of "autocorrelation" is a red herring that has nothing to do with anything. Their randomly-generated results do not match what the original paper shows.
None of this really sheds much light on to what degree the results can be or have been robustly replicated, of course. But there's nothing inherently problematic whatsoever about the way it's visualized. (It would be nice to see bars for variance, though.)
The autocorrelation is important to show that it's transformation to D-K plot will always give you the D-K affect for independent variables.
However, the focus on autocorrelation is not very illuminating. We can explain the behaviors found quite easily:
- If everyone's self-assessment score are (uniformally) random guesses, then the average self-assessment score for any quantile is 50%. Then of course those of lower quantile (less skilled) are overestimating.
- If self-assessment score vs actual score are dependent proportionally, then the average of each quantile is always at least it's quantile value. This is the D-K effect, which is weaker as the correlation grows.
-The opposite is true for disproportional relation.
So, the D-K plot is extremely sensitive to correlations and can easily over-exaggerate the weakest of correlations.
> "On average, people estimate their ability around the 65th percentile (actual results) rather than the 50th (simulated random results) -- a significant difference"
This is a different issue than D-K. The D-K hypothesis is that self assessment and actual performance are less correlated for weaker than higher performing individuals. People think they're better than average is a different (and much less controversial) bias.
---
[DK-Effect] : I totally know I scored at least a 30% on that test, and that's certainly way better than average (it's not). [Actually scored 10%]
[No DK-Effect] : I totally know I scored at least a 30% on that test, and that's certainly way better than average (it's not). [Actually scored 30%]
> The D-K hypothesis is that self assessment and actual performance are less correlated for weaker than higher performing individuals.
Isn't that what the graph shows? The bottom quartile group is guessing almost 50 percentile points higher than their actual performance, whereas the top quartile is at most 15 points off.
They're all guessing somewhere between the 60th and 75th percentiles (i.e. "I'm a bit better than average") - with some upwards trend since the high performers seem to at least know they have some skill, although not very accurately. It's just that for the poor performers, a guess of the 60th percentile wayyy off the mark.
EDIT: Something important for the rest of this post. In case it's not clear, the graph is showing your percentile ranking within the group - not your actual score.
Nope, because there's an interesting statistical trick in play. Imagine you take 100 highly skilled physicists and give them some lengthy series of otherwise relatively basic physics questions. Everybody is going to rate their predicted performance as high. But some people will miss some questions simply due to silly mistakes or whatever. And those people would end up on the bottom 10% of this group, even if the difference between #1 and #100 was e.g. 0.5 points. Graph it as D-K did, and you'd show a huge Dunning Kruger effect, even when there is obviously nothing of the sort.
In fact the fewer differences in ability within a group, and the greater the relative ease of a task, the bigger the Dunning-Kruger effect you'd show. Because everybody will rate themselves relatively high, but you will always have a bottom 10%, even if they are practically identical to the top 10%.
You can see this most clearly in the original paper. They carried out 4 experiments. The one that was most objective and least subject to confounding variables was #2, where they asked people a series of LSAT based logic questions, and assessed their predicted vs actual results. And there was very little difference. Quoting the paper, "Participants did not, however, overestimate how many questions they answered correctly, M = 13.3 (perceived) vs. 12.9 (actual), t < 1. As in Study 1, perceptions of ability were positively related to actual ability, although in this case, not to a significant degree." Yet look at the graph for it, and again it shows some seemingly large D-K effect.
And there's even more issues with D-K, and especially experiment #1 (which is the one with the prettiest graph by far), but that's outside the scope of this post. I'm happy to get into it, if you are though. I find this all just kind of shocking and exceptionally interesting! I've referenced the D-K effect countless times in the past, never again after today!
Yes yes yes! I’m in the very same boat, and came to an epiphany that the ranking trick here, combined with some subjective questions (ability to appreciate humor - seriously!?), that these things hide almost everything about actual skill. Not only does it amplify mistakes, it also forces the participants to have to know something about their cohort. Having to guess your ranking fully explains the less than perfect correlation. It also undermines all claims about competence and incompetence. They’re not testing skill, they’re only testing ability to randomly guess the skill of others.
What about the slight bias upwards? Well, what exactly was the question they asked? It’s not given in the paper. They were polling only Cornell undergrads looking for extra credit. What if the question somehow accidentally or subtly implied they were asking about the ranking against the general population, and then they turned around and tested the answers against a small Cornell cohort? I just went and looked at the paper again and noticed that the descriptions of the ranking question changed between the various “studies” with the first one comparing to the “average Cornell student” (not their experiment cohort!). The others suggest they’re asking a question about ranking relative to the class in which they’re receiving extra credit. Curiously study 4 refers to the ranking method of study 2 specifically, and not 3. The class used in study 4 was a different subject than 2 & 3. How they asked this question could have an enormous influence on the result, and they didn’t say what they actually asked.
Cornell undergrads are a group of kids that got accepted to an elite school and were raised to believe they’re better than average. Whether or not all people believe they’re better than average, this group was primed for it, and also have at least one piece of actual evidence that they really are better than average. If these were majority freshmen undergrads, they might be especially in calibrated to the skills of their classmates.
In short, the sample population is definitely biased, and the potential for the study to amplify that bias is enormous. The paper uses suggestions and jumps to hyperbolic conclusions throughout. I’m really surprised that evidence and methodology this weak claims to show something about all of humanity and got so much attention.
> The D-K hypothesis is that self assessment and actual performance are less correlated for weaker than higher performing individuals.
I’m not sure that’s an accurate summary. The correlation of the perceived ability is effectively the slope of the line, and the slope is more or less constant. The paper suggests that the bias of the bottom quartile is higher than the bias of the upper quartile, not that the correlation is any different.
But it’s strange that the DK paper makes an example of the lower performers, since the bias of the scores appears to be constant; it appears the high performers have pretty much the same bias as the low performers — it’s a straightish line that goes through 65% in the middle rather than the expected straight line that goes through 50% in the middle. If the ‘high performers’ had a different bias, then the line wouldn’t be so straight.
1. the slope of self-perceived ability is lower than actual ability
2. The y intercept is dependent on difficulty of test
Therefore with an easier test the better testies are more accurate, and with a very difficult test the worse testies are more accurate because of where the lines intersect. Meaning DK is artifact of test difficulty.
This also means if the test was difficult enough you could create a bizarro-DK effect where the better testies were less accurate.
For 1, the data is based on guessing, so it’s zero surprise that self-perceived ability doesn’t correlate perfectly with actual ability. It would be extremely surprising and unbelievable if the slopes were the same, right?
For 2, the DK paper shows one thing, but the replication attempts have show this effect doesn’t even exist for very complex tasks, like being an engineer or lawyer. The DK effect doesn’t generalize, and doesn’t even measure exactly what it claims to measure, which is why we don’t need to speculate about the bizarro-DK reversal effect - we already have evidence that it doesn’t happen, and we already have a big enough problem with people mistakenly believing that DK showed an inverse correlation between confidence and competence, when they did no such thing.
> The D-K hypothesis is that self assessment and actual performance are less correlated for weaker than higher performing individuals
That may have been a hypothesis Dunning and Kruger had at some point, its not the effect they actually identified from their research. But I don't think its even that, its an “effect” people have associated with D-K because they heard discussion of the D-K research that got dustorted at multiple steps from the original work, and then that misunderstanding, because it made a nice taunt, replicated widely and became popular.
To be fair, the paper itself uses hyperbolic language that completely distorts it’s own data. It heavily pushes and leads the reader into one possible dramatic explanation for their results, while downplaying and ignoring a bunch of other less dramatic explanations. Using words like “incompetent” are almost completely unfounded based on what they actually did. Section headings like “competence begets calibration”, “it takes one to know one”, and “the burden of expertise” are uncurious platitudes and jumping to conclusions. I’m kind-of stunned at the popular longevity of this paper given how unscientific is it and how often replication results with better methodology have shown conflicting results.
"Perhaps more controversial is the third point, the one that is the focus of this article. We argue that when people are incompetent in the strategies they adopt to achieve success and satisfaction, they suffer a dual burden: Not only do they reach erroneous conclusions and make unfortunate choices, but their incompetence robs them of the ability to realize it."
> That people's self-estimation increases with their actual ability, but only by a surprisingly small degree (actual results show a slight upwards trend, simulated random results are flat) -- another significant difference
If everyone thinks they are slightly above average, isn't this inevitable? If everyone thinks they are slightly above average, people who are slightly above average are going to be the most accurate at predicting where they land?
> If everyone thinks they are slightly above average, isn't this inevitable? If everyone thinks they are slightly above average, people who are slightly above average are going to be the most accurate at predicting where they land?
Yes, it’s inevitable. And this study only asked Cornell undergrads what they think of themselves - people who were taught to believe they are above average, and also people who got into a selective school and probably all had higher than average scores on standardized tests. Is it surprising in any way that this group estimated their ability at above average?
Even if "people tend to slightly overrate their own ability," was the only takeaway, it would still refute the author's conclusion that DK has nothing to do with human psychology.
Have you not just summarized the Dunning-Kruger effect in other words?
That essentially follows from everyone assume they are slightly above average. That's also the crux of the refutation and why the whole autocorrelation is a red hering, even if we all would just self assess completely randomly, that actually confirms the Dunning-Kruger effect is real (because if we self assess randomly worse performance are more likely to overestimate).
We could argue that this is not surprising, but the "surprising" bit is that the curves show that better performers are actually more skilled at assessing their performance, which incidentally was also confirmed by the followup studies.
Is it though? Everyone overestimating their ability a bit isn't DK effect. It's when people with less knowledge and ability vastly over estimate their ability (because they don't know how little they know - while others do), and the opposite for those who are truly more able and knowledgeable (again because they understand how vast the topic is and though they know more and are capable more than the average person, they also understand how little they truly know compared to what they don't know)
There are those that don't know, and don't know that they don't know. They evaluate themselves the highest.
There are those that know, and don't know that they don't know. They evaluate themselves a bit better than those before.
There are those that know, and know that they don't know. They evaluate themselves worst than those before them. This is the d-k valley, imposter syndrome, confidence issues.
There are those that know, and know that they know. They are much better at evaluating themselves than those before them. They have experience to know what they know, and what they dont know, and they still continue to underrate themselves vs the first bunch, but they are more accurate and closer to the truth.
- On average, people estimate their ability around the 65th percentile (actual results) rather than the 50th (simulated random results) -- a significant difference
- That people's self-estimation increases with their actual ability, but only by a surprisingly small degree (actual results show a slight upwards trend, simulated random results are flat) -- another significant difference
The author's entire discussion of "autocorrelation" is a red herring that has nothing to do with anything. Their randomly-generated results do not match what the original paper shows.
None of this really sheds much light on to what degree the results can be or have been robustly replicated, of course. But there's nothing inherently problematic whatsoever about the way it's visualized. (It would be nice to see bars for variance, though.)