This post explains in a very simple way the scientific method (p-values) to prov...

capnrefsmmat · on Dec 15, 2017

This is not true — it’s actually worse.

The 5% figure means that, when there is no signal to detect, we have a 5% chance of falsely claiming there is one. It does not say anything about the case when there is a signal and we do not detect it, which is known as the type II error rate.

With reasonable assumptions about sample size and the fraction of times there really is a signal, you can find that the majority of published results are false:

http://journals.plos.org/plosmedicine/article?id=10.1371/jou...

I have written an intuitive explanation of this, and a bunch more, in my book: https://www.statisticsdonewrong.com/p-value.html

fnl · on Dec 15, 2017

Great book, thanks for writing it!

dataflow · on Dec 15, 2017

> It's crazy that by using a p-value of 0.05, it means that 5% of all scientific results might be false.

That would only be the case if scientists were robots who immediately published anything with a p-value up to 0.05. They're not, though. If they get clearly nonsensical results, they will obviously re-evaluate it. In other words, the p-value doesn't incorporate the fact that the experiment passed sanity checks in your own head (and the reviewers') before it was published. (And yes, there are bad actors in every field who game the system, but my point still stands.)

adrianN · on Dec 15, 2017

From what I've heard on HN, scientists are actually robots who massage their data until they get a p-value <0.05 and then immediately publish.

nonbel · on Dec 15, 2017

No, the publishing process takes a long time. Sometimes it could be years.

adrianN · on Dec 15, 2017

Yeah, but in Russia they use nine women to produce a baby in only one month.

nonbel · on Dec 15, 2017

This sounds like some kind of comment about divisibility of the work to publish something. I don't get it though.

After the bulk of the paper is written, I can easily proofread, typeset, etc everything myself in less than a week. Now get someone else to double check that. Lets say that is another week.

After that the only thing is to get someone worthwhile to spend some time on your paper and point out anything confusing or erroneous. Granted, this could take a month or so of study. However, I never really saw that happen in practice. In reality you would be lucky to get people to glance over it one evening.

So what is taking so long?

adrianN · on Dec 15, 2017

In my experience a significant fraction of the time it takes to publish a paper is spent waiting for the journal. During that time you can do other useful research. The long delay between submitting, getting through the reviewers and the actual publication is one of the reasons why for example in CS a lot of the interesting stuff happens in conference publications with fast turnarounds and the journal versions of the same paper appear a year or two later.

nonbel · on Dec 15, 2017

>"waiting for the journal"

Yes, what are they doing?

capnrefsmmat · on Dec 15, 2017

I suspect this factor is balanced, or completely overruled, g the scientists who get p values greater than 0.05, decide that result doesn’t pass their sanity check (it clearly should be significant!) and collect more data or tweak the methods until it’s significant.

golergka · on Dec 15, 2017

I think that if scientists were robots, results would be much better than we have today - robots don't care about careers and grants.

bluGill · on Dec 15, 2017

Depends on the field. Some fields are very careful about this, while others are not.

9mit3t2m9h9a · on Dec 15, 2017

5% false discovery rate is true if the apriori probability of each result is 50%. If a journal wants to publish only surprising results, and accepts p=0.05 with a good methodology as true, it will have higher rate of false claims, because the most interesting things are more surprising than a coin toss.

And if you slice the data from a single experiment in 40 independent ways, your chances to get something with purely random significance p<0.05 are better than 50% for a single study…

cdancette · on Dec 15, 2017

It doesn't "mean" that the a priori is 50%, I guess what you want to say is that if we consider the a priori probabilities are equal between the two hypothesis, then we can use the 5% p-value.

But if the probability of the hypothesis is much less likely, we might need a p-value much lower to be sure.

Edit: my comment doesn't mean much since you edited your first sentence.

nonbel · on Dec 15, 2017

>"It's crazy that by using a p-value of 0.05, it means that 5% of all scientific results might be false."

How so? I don't see any connection between significance level and % of false scientific results at all.

If you assume the "null hypothesis" is always true, then 5% of the results should falsely say otherwise. Of course, this is if all the assumptions behind the math hold, no p-hacking, etc.

However, that is like saying it is extremely rare for there to be a correlation between any two phenomena. We don't live in that universe. In our universe, correlations are extremely common:

>"These armchair considerations are borne out by the finding that in psychological and sociological investigations involving very large numbers of subjects, it is regularly found that almost all correlations or differences between means are statistically significant. See, for example, the papers by Bakan [1] and Nunnally [8]. Data currently being analyzed by Dr. David Lykken and myself, derived from a huge sample of over 55,000 Minnesota high school seniors, reveal statistically significant relationships in 91% of pairwise associations among a congeries of 45 miscellaneous variables such as sex, birth order, religious preference, number of siblings, vocational choice, club membership, college choice, mother's education, dancing, interest in woodworking, liking for school, and the like. The 9% of non-significant associations are heavily concentrated among a small minority of variables having dubious reliability, or involving arbitrary groupings of non-homogeneous or nonmonotonic sub-categories. The majority of variables exhibited significant relationships with all but three of the others, often at a very high confidence level"

-Theory testing in psychology and physics: A methodological paradox. http://www.fisme.science.uu.nl/staff/christianb/downloads/me...

emiliobumachar · on Dec 15, 2017

Obligatory xkcd: https://www.xkcd.com/882/