> The paper reports on the results of a 'logic' test administered to undergrads ...

mike_hearn · on April 22, 2022

The tasks aren't arbitrary. They're meant to be a proxy for some universal concept of competence. That's why DK is a well known effect, it claims to hold true for anything even though they can't test every possible task.

> we presented participants with tests that assessed their ability in a domain in which knowledge, wisdom, or savvy was crucial: humor (Study 1), logical reasoning (Studies 2-and 4), and English grammar (Study 3).

They picked humor because they think it reflects "competence in a domain that requires sophisticated knowledge and wisdom". They then realized the obvious objection - it's subjective - and decided to do the logical reasoning task to try and rebut those complaints (but then why do the first experiment at all?):

> We conducted Study 2 with three goals in mind. First, we wanted to replicate the results of Study 1 in a different domain, one focusing on intellectual rather than social abilities. We chose logical reasoning, a skill central to the academic careers of the participants we tested and a skill that is called on frequently ... it may have been the tendency to define humor idiosyncratically, and in ways favorable to one's tastes and sensibilities, that produced the miscalibration we observed-not the tendency of the incompetent to miss their own failings. By examining logical reasoning skills, we could circumvent this problem by presenting students with questions for which there is a definitive right answer.

So logical reasoning was chosen because:

1. It's objective.

2. It's an important skill.

3. It's a general "intellectual" skill.

That makes it very important if it's actually a good test of logical reasoning. If it was truly an arbitrary test like an egg-and-spoon-race or something, then there's no reason to believe the results would generalize to other areas of life and nobody would care.

samhw · on April 22, 2022

> The tasks aren't arbitrary. They're meant to be a proxy for some universal concept of competence.

I’ve seen absolutely nothing suggesting this. It’s explicitly about task competency; no particular task is specified nor needs to be specified.

> That's why DK is a well known effect, it claims to hold true for anything even though they can't test every possible task.

Yes, they claim it holds true for everything because it’s how human beings introspectively experience being poor at a task. It’s really not necessary to have some Platonic ideal of Task Competency … which is then specifically restricted to logical tasks for reasons known only to you.

> Logical reasoning was chosen because: It’s objective.

I think there’s a kernel of truth in this, albeit assuming by ‘objective’ you instead mean (as people often do) something like “people almost always agree in their evaluations of this quality”. You need that for a good experiment. I’m still not sure how it relates at all to your point here. Personally I would find it easier to just say “I was wrong, it’s not explicitly about logic, I just associated it with that because it’s commonly adduced in silly arguments about logic/intelligence on the internet” - but ah well, it’s an interesting theory so I’m happy to discuss it.

mike_hearn · on April 23, 2022

The reason no particular task needs to be specified to invoke DK is exactly because they argue that their initial selection of experimental tasks is so general, that the effect must apply to everything.

It feels like you and danbruc are inverting causality here. You start from the assumption that DK is a real effect and then say, because it's real and general, it doesn't matter what tasks they used to prove it. But that's backwards. We have to start from the null hypothesis of no effect existing, and then they have to present evidence that it does in fact exist. And because they claim it's both large and very general, they need to present evidence to support both these ideas.

That's why they explicitly argued that their tasks reflect general attributes like wisdom and intelligence: they wanted to be famous for discovering a general effect, not one that only applies in very specific situations.

But their tasks aren't great. The worst are ridiculous, the best are unverifiable. Thus the evidence that DK is a real and general effect must either be taken as insufficient, or you could widen the argument to include studies by other psychologists that pursue the same finding via different means.

danbruc · on April 25, 2022

And because they claim it's both large and very general, they need to present evidence to support both these ideas.

To me the claims in the paper do not really seem that strong, almost to the point that I am not sure if they claim anything at all. If you read through the conclusions, they mostly report the findings of their experiments. The closest thing to any claims about generality I can find is that they discuss in which scenarios their findings will not apply. You could maybe read into this that they claim that in all other scenarios their findings apply, but that is not what they actually do.

But I guess the better way to discuss this is that you just quote the claims from the paper that you consider too strong and unjustified instead of me trying to anticipate what you are referring to or me going over each claim in the paper.

danbruc · on April 22, 2022

The tasks aren't arbitrary. They're meant to be a proxy for some universal concept of competence.

This seems at least somewhat wrong to me - the competence is not universal but task specific. They compare how your competence to perform task X is related to your ability of assessing your performance of task X in absolute terms and relative to the other participants. They repeat this for different tasks and find that for all tested tasks the same pattern emerges - roughly, the better your performance, the better your ability to accurately assess your own performance and the performance of others.

So you can be competent doing task X and provide accurate assessments for task X performances while at the same time being incompetent doing task Y and being less accurate in assessing task Y performances. This essentially means that you can not be universally good at assessing performances of arbitrary tasks, you can only do this well for tasks for which you are yourself competent.

danbruc · on April 22, 2022

For completeness I would add that a good task must allow objectively rating the performance of participants with [much] room for debate. But given that, the whole setup is self-contained and task-independent. Let participants perform the task and establish their competence by rating their performance. Then let participants perform the meta-tasks of rating their performance in absolute and relative terms and finally check how task and meta-task performances are related.