As imperfect measures as they are, statistical significance measures like the p-value are more epistemologically sound than any arbitrary rule of thumb-threshold on the size of the effect.
> statistical significance measures like the p-value are more epistemologically sound than any arbitrary rule of thumb-threshold on the size of the effect.
Keep in mind that the GP isn't saying the effect doesn't exist if it's in the single digits, but that it is inconclusive and/or insignificant. Insignificant in the human sense, not the statistical sense.
A 1% increase in this behavior? Irrelevant to almost everyone.
This, of course, is not even getting into the issue of the reproducibility crisis, much of which did rely on p-values. While I personally am happy to do p-tests, the skepticism of small effects is well founded. Were someone else to try to reproduce the effects and fail, the standard defense is that the results are sensitive to the methodology used. It's much easier to invoke that defense if your effect is 1% vs 20%.
P-values are based on a normally-distributed sampling distribution which presumes the samples are randomly-chosen. It's hard to see how random sampling can apply very well here.
Suppose we did a comparison between the p-values of biased researchers desperate to publish, versus a conservative heuristic that doesn't believe small effects. Particularly for social science experiments like this, which would you bet on being able to assess repeatability better?
Statistical significance is a good measure of how sure we are there is a correlation, or in this case with an RCT, causation. But any layperson can look at a 3% effect and conclude, yeah, that's probably not that big of a deal. Or not, depending upon your preferences! No judgement. It's not something that requires a degree to determine, just an assessment of one's own values and the effect size in that context.