I have a strong feeling that, if people really put an effort into reading and replicating more papers, we would find that a lot of what's being published is simply meaningless.
In grad school I had a subletting roommate for a while who was writing code to match some experimental data with a model. He showed me his model. It was quite literally making random combinations of various trigonometric functions, absolute value, logarithms, polynomials, exponents, etc. into equations that were like a whole page long and just wiggling them around. He was convinced that he was on a path to a revolution in understanding the functional form of his (biological) data, and I believe his research PI was onboard.
I guess "overfitted" never made it into the curriculum.
> It was quite literally making random combinations of various trigonometric functions, absolute value, logarithms, polynomials, exponents, etc. into equations that were like a whole page long and just wiggling them around.
Technically, we call that a "neural network". Or "AI".
I work in computational materials science (where ML brings funding) and a funny paper of this kind is here: https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.11... - they are literally trying out 100000s of possible combinations by brute force, to build a "physical model".
Then they go on conferences and brag about it, because they have to (otr they know it's bs).
Datasets are soso (you can have a look at QM9...) and for more specialized things, people generally don't bother trying to benchmark or compare their results on a common reference. It's just something new...
And with all that: even without doing fancy statistical methods without knowing too much about it, your theoretical computations might not make so much sense (at least in the sheer number which is pumped out and published)...
Oh, I thought "on the real" fit the context better, meaning they knew in their heart of hearts it was bullshit, but "off the record" is about the same.
> I have a strong feeling that, if people really put an effort into reading and replicating more papers, we would find that a lot of what's being published is simply meaningless.
People have figured that out long ago [1] (I know the author of that paper lately turned somewhat controversial, but that doesn't change his findings). It's not very widely known in the general public. But if you understand some basic issues like p-hacking and publication bias and combine that with the knowledge that most scientific fields don't do anything about these issues, there can hardly be any doubt that a lot of research is rubbish.
Yeah, but one would hope that science has a higher standard. 80% garbage results in science sounds catastrophic to our understanding of the world, and in particular when it comes to making policies based on that science.
There's the saying "science advances one funeral at at time."
'‘A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it.’ This principle was famously laid out by German theoretical physicist Max Planck in 1950 and it turns out that he was right, according to a new study.'
Also the story of Ignaz Semmelweis who discovered that if doctors washed their hands it reduced deaths during childbirth - but for a variety of reasons his findings were resisted.
In grad school I had a subletting roommate for a while who was writing code to match some experimental data with a model. He showed me his model. It was quite literally making random combinations of various trigonometric functions, absolute value, logarithms, polynomials, exponents, etc. into equations that were like a whole page long and just wiggling them around. He was convinced that he was on a path to a revolution in understanding the functional form of his (biological) data, and I believe his research PI was onboard.
I guess "overfitted" never made it into the curriculum.