One thing that troubles me in the paper is that the researchers appear to have gone looking for precursor patterns in an ad hoc way, with no physical theory in mind
That seems like an extremely good thing to me. Looking for patterns and then figuring out causality later is a great way to solve real-world problems. Of course you can be led astray if your filters are too open, but if your work is rigorous and you still come out with a 6-sigma correlation then congratulations, you found a signal. What does it mean? How does it work? Who cares, there's plenty of time to figure that out. But in the meantime you can hypothesize that the connection exists, monitor it for a year or two, and if the correlation holds up and turns out to have some predictive value, then a winner is you.
If you dredge through enough random data, you will always find a six sigma correlation (or five, or however many sigmas you want), kind of by definition. This is why experimenters like particle physicists or gravitational wave astronomers who have petabytes of raw data have to define their criteria in advance, instead of just going to town looking for patterns.
I agree with you that if this observation proves to have predictive power, then the way it was found doesn't matter. But right now we're at the less reliable "we looked at all conceivable retroactive combinations of stuff and found this pattern" stage of the process.
The abstract makes it clear this is part of searching for an earthquake global warning syndrome. So in terms of motivation why the link exists is secondary, if it can for warn significant earthquakes that is all that matters.
Besides, the first "scientists" looked at the sky and just stared until patterns appeared.
You can't condemn scientists for looking at data. Not everything can be magicked up from first principles in a vacuum.
No scientist would claim that falsifiable hypothesis and empirical validation aren't needed. But I don't think any scientist would criticise someone for looking for patterns on a whim and pointing it out to others.
both are valid strategies, what i think is bad is when groups of people claim that broad sweeping searches are all bogus and that you must have a priori knowledge to be relevant.
there should be people dredging through petabytes of cern data like this, exploratory analysis can give you insight into things you didn't know to look for.
even if you find things that are bogus you figure out why and learn how to avoid it given the context of the data.
to say otherwise becomes a hindrance to progress. plus its not like we have any other way of predicting earthquakes, trying to read the magnetic field to predict events sounds really promising, i'd be willing to give it a shot even if it is bogus, that's just risk assessment.
Sincere question from a non-scientist who struggles with the idea of how to make use of existing data without accidentally P-hacking:
Is it still P-hacking if you stumble upon a correlation in the historical record (after stumbling around for a while), call it a hypothesis, and then stick with it long though to gather a statistically significant amount of _new_ data to support it?
More broadly, are there ways to "go on a fishing expedition" that are still scientifically valid?
As long as you get new data in a way that can falsify your hypothesis that's fine. If you bias your data collection to favor your hypothesis that's still cheating.
Inarguably. The folks arguing that it's p-hacking aren't taking the next step of treating the correlation as a hypothesis, testing it, and establishing causality.
Yeah, there are valid ways to use the data. Looking at the parent again perhaps I was misreading it; I was responding to the idea that you could find the correlation and just go straight from there.
While it's not encouraged in publish-or-perish contexts, you know, because money, but it is also a non-zero event that some of the quite important things in science have come to bear because of precisely this.
What's the old saying, "The best saying in science is not 'Eureka!', but rather 'Huh, that's weird...'"?
That seems like an extremely good thing to me. Looking for patterns and then figuring out causality later is a great way to solve real-world problems. Of course you can be led astray if your filters are too open, but if your work is rigorous and you still come out with a 6-sigma correlation then congratulations, you found a signal. What does it mean? How does it work? Who cares, there's plenty of time to figure that out. But in the meantime you can hypothesize that the connection exists, monitor it for a year or two, and if the correlation holds up and turns out to have some predictive value, then a winner is you.