In particle physics, it was quite fashionable (and may still be) to iterate on b...

klysm · on July 26, 2024

Interesting I wasn’t aware of that. Another thing I’ve only briefly read about is registering studies in advance, and quite literally preventing iteration.

disgruntledphd2 · on July 27, 2024

Given the set of scientific publication assumptions (predominantly p<=0.05) this can easily allow one to find whatever proof you were looking for, which is problematic.

That being said, it's completely fair to use cross-validation and then run models on train, iterate with test and then finally calculate p-values with validation.

The problem with that approach is that you need to collect much, much more data than people generally would. Given that most statistical tests were developed for a small data world, this can often work but in some cases (medicine, particularly) it's almost impossible and you need to rely on the much less useful bootstrapping or LOO-CV approaches.

I guess the core problem is that the methods of statistical testing assume no iteration, but actually understanding data requires iteration, so there's a conflict here.

If the scientific industry was OK with EDAs being published to try to tease out work for future experimental studies then we'd see more of this, but it's hard to get an EDA published so everyone does the EDA, and then rewrites the paper as though they'd expected whatever they found from the start, which is the worst of both worlds.

bordercases · on July 26, 2024

Yeah it's essentially a way to reflect parsimonious assumptions so that your output distribution can be characterized as a law.