Hacker News new | past | comments | ask | show | jobs | submit login
Science Isn’t Broken (2015) (fivethirtyeight.com)
55 points by KC8ZKF on May 24, 2017 | hide | past | favorite | 54 comments



My pet idea for making science less wrong is to maintain a citation graph of all papers. Then when a problem is found with any paper, every other one downstream from it is automatically flagged as at risk of being wrong. Now all those authors (or others) have to go back and re-evaluate how that citation was used and how it affects their result. Once they decide it's OK, they update their paper and remove the flag. If it's not OK then their paper is retracted or flagged as wrong or simply keeps its at-risk flag if it's not important enough for anybody to bother re-checking it.

This way, people will be reluctant to cite too many useless papers for their friends, and will be reluctant to depend on unreliable ones. Work with good methodology would be more popular to cite or rely on.

Maybe peer review could even become optional this way. If other researchers trust your work enough to risk their own papers by citing it, that would act as a peer approval in the long term. Actual peer review would only be needed to give an immediate indication of the quality.

We would first need to solve the versioning problem where there's no way to update a paper when mistakes are discovered or simply to improve it.


> My pet idea for making science less wrong is to maintain a citation graph of all papers. Then when a problem is found with any paper, every other one downstream from it is automatically flagged as at risk of being wrong. Now all those authors (or others) have to go back and re-evaluate how that citation was used and how it affects their result. Once they decide it's OK, they update their paper and remove the flag. If it's not OK then their paper is retracted or flagged as wrong or simply keeps its at-risk flag if it's not important enough for anybody to bother re-checking it.

This is an interesting idea, but I don't know what real-world problem it would actually solve. As someone who reads a lot of scientific papers, I can't think of a single one whose correctness is actually dependent on the correctness of the results of a paper it cites, as though the paper in question is "buggy" because it "calls another paper as a subroutine" and that subroutine is buggy. Could you give an example or two of this?

I could see this making sense in mathematics since mathematical proofs really do "call other papers as subroutines" (theorems), but not so much in science.


>"As someone who reads a lot of scientific papers, I can't think of a single one whose correctness is actually dependent on the correctness of the results of a paper it cites"

What field are you reading where each paper is independent of all the others? As the most obvious example, in biomed pretty much every paper says something like "as done previously[ref x, y]" in the methods section. It is an extremely annoying, but very common, practice (usually refs x and y contain somewhat different methods descriptions and/or also contain the "as done previously" phrase). There are many other ways the papers depend on each other too ("we used these cells because they don't express this protein[ref z]", etc)


I more or less exclusively read CS papers, which sometimes contain similar statements. I don't see how in your example, "ref x" being invalid would make the paper citing it invalid. Can you give an actual example of a paper that does this that could be invalidated in this way?

Basically, to me, an empirical result arrived at using the scientific method is, well, an empirical result arrived at using the scientific method. And it seems to me that the only thing that would invalidate such a result is if it were found that the researchers did not follow the protocol they described in their paper. The fact that the researchers might have merely cited another paper, or used the method described in another paper, in which the researchers were found to have "cheated" in this way, doesn't seem to invalidate the paper in question. It's possible that the researchers in question could have happened to also cheat, but I don't see any reason to assume it's any more likely because it happened with a paper they cited (unless of course they're the same researchers building on their own previous work).


For the methods citation example, the citation chain (is supposed to) go all the way back to the paper devoted to originally describing the method. In that paper there (is supposed to be) a bunch of checks (compare to other methods, etc) that the method is working as claimed. These checks will not be repeated every time.


    > > whose correctness is actually
    > > dependent on the correctness of the
    > > results of *a* paper it cites

    > where each paper is independent of
    > *all* the others
These are not describing the same thing


If I understand, the author is suggesting something like pagerank for axiomatic correctness where a paper's score can be interpreted per paper (in terms of some binary promise of "correctness") even if the score is computed by the weighted correctness of the papers it represents.

Really, the information should be abstracted away from the paper, a la freebase or wikidata, and there should be strict rules (e.g. the protocols stackoverflow enforces) such that no cycles can occur, such that propositions can be resolved to axioms, and such that it is practical to compute scores in a finite period of time with reasonable resource constraints...

See also gitxiv.org (at attempt at literate research for computer science and physics papers that can be paired with code). It doesn't address the problem described, but if statements of a paper are codified (e.g. curry howard isomorphism) in a strict way, it makes it easier for "linked statements" to have a quantifiable value.


I think they are, unless you are selectively citing (irrelevant) papers to avoid depending on any...


I mainly read papers of second language acquisition which is an empirical science that performs language learning experiments. Often the "inspiration" for the research in newer papers stems from the results of earlier ones. The empirical results of the newer ones are, of course, not directly invalidated if there was a problem with an earlier one.

However, some of the papers tend to discuss the results in the light of the current empirical results AND the earlier empirical results ("from this result, we now see that A → B, and from the earlier results we already know that B → C, so there's some evidence that A → C"); this of course biases the discussion and may lead to wrong conclusions.

By the way, the largest problems I see in the papers are not directly "wrong" results, but like the "Chinese whispers" or "broken telephone" game. The paper A makes some assumptions and uses certain test protocol, measurements etc, and then paper B refers the result but misrepresents it slightly, which may cause overgeneralisation. A concrete example: paper A tests the learning of some phoneme with a training period and before-after test and publishes the result. Then the paper B later references the paper A mentioning that A has shown that the learners "learn" the phoneme. However, represented this way – "unqualified" – this is an overgeneralisation: the students haven't actually shown to have learned the phoneme in any other way than using the test. Indeed – other studies have shown that using certain kind of tests are generally very poorly correlated with what people usually mean by "having learnt" something – being able to use it in spontaneous communication.


What would be your plan for papers that cite another one because they are doing a literature review or something akin to "Paper X is wrong"?

There's also a trickier problem in your simple binary flag: papers can have incorrect proofs of correct statements.

I'd love to see a Semantic Web approach where you don't cite papers but results in papers. So while a result in a paper might be wrong, you don't have to throw out all the results in all the papers citing this paper.


@averagewall, check out openknowledgemaps.org

Also, citeseerx and crossref are your friends.

Pubmed central similarly has the ability to return much of this citation data programatically (for certain scientific subdomains)

If you're passionate about prototyping such a service, let me know and we can get you an invite to the Archive Labs slack (we're a non-profit, volunteer-run community which incubates open access and public good projects). Literate, reproducible research and knowledge + citation graphs are two of our emphases.

Good luck


P.S. One way you might describe this project is "an axiomatic Euclid's Elements for all scholarly knowledge".

You might also like metacademy.org (one of my favourite projects -- it's a package manager for knowledge built on DAGs)


You could prototype this with the data in crossref at the moment. They don't have full citation coverage (basically nobody does, it's a bugger of a problem), but there should be enough there to try out some ideas with real world data (and free data with an API).

> We would first need to solve the versioning problem where there's no way to update a paper when mistakes are discovered or simply to improve it.

You can certainly update what a DOI points to.


Surely this exists.


This is a nice summary of fivethirtyeight's science crisis work, but (as with most treatments of the subject) it leaves me fundamentally unsatisfied.

Apparently we've identified the problem enough to say "there's a problem". We can say "look at all the ways you can manipulate this analysis to achieve the desired outcome". We can say "gosh, science is harder than we thought". But it seems we're still far from a convincing solution.

The fact that statistical analysis is so liable to manipulation seems to call the entire thing into question. In the article they take comfort from the fact that many of the labs analyzing the red card/race data arrived at similar conclusions. One would assume this is because they made similar choices in the analysis. But what says those were the right sorts of choices? Could it not simply be that the labs shared the same biases and errors, making that outcome more common? Is a proper analysis really determined by (essentially) democratic vote? If that's what we've arrived at, it gives me less rather than more confidence in the robustness of the scientific process.

It feels like something fundamental has to be reimagined. It's difficult to prove things about the world---but maybe it's actually near-impossible? Or maybe we need to get real about the cost of actually demonstrating anything reliably. Instead of individual labs running one-off experiments it becomes researchers collaborating openly to build the perfect experiments, which are then run by many different labs, then analyzed collectively in the open for strengths and weaknesses, then reformulated, sent out again, and so on iteratively until at the end of years of research one little bit of probably-truth drips out the bottom of the system.

But that bit would be something we could build on.


Maybe the thing that needs to be reimagined is the naive image of science as an infallible way of always getting true answers. Science works, just not in as simple way as one might naively think. Feyerabend's "Against Method" is my favourite place to start :-).


It's difficult to prove things about the world---but maybe it's actually near-impossible?

Scientists always hesitate to talk about "proof," since we hold our beliefs on a tentative basis. However, some scientific theories have been corroborated to such a degree that they can be regarded as proven for all practical purposes, within broad domains. These include classical mechanics, relativity, gravitation, quantum mechanics, thermodynamics, electromagnetism, and Darwinian evolution. This level of "proof" is trusted on a daily basis in regular lab work, e.g., if an electrical circuit violates Ohm's Law, you don't throw out Ohm's Law. You look for a bad solder joint.

Granted, science is presently dominated by the life sciences. I'm sure the physical sciences have their own problems -- we shouldn't be overconfident -- but they may be qualitatively different problems. In my view, a journalist shouldn't generalize about "science" without showing that they've considered whether the generalization is justified.

Disclosure: My background is in physics, and my day job is in scientific instrumentation. It's a business where people are obsessive about replication. When an instrument goes out the door, it's been put through a stringent set of tests that could be described as replications, and the results of these tests are compared across different designs, models, batches of components, and even competing brands, all around the world.


Analysis being determined by a(n essentially) democratic vote is actually just replication of results.

While the system is highly flawed now, I would argue that collaborating to build perfect experiments carried out by multiple labs would not be a good idea. You point out if people made the same sorts of choices then perhaps everyone just has the same biases and errors. But if all the people on one experiment collaborate, then they'll definitely all have the same biases. Instead, independent verification in which the nitty-gritty details are abstracted away for a more general methods section would allow for more alternate takes at approaching a problem.


I like your idea about collaborating on bigger experiments. One big experiment can be both more convincing and more accessible than 1000 little ones all measuring different incomparable things that take 1000s of man-hours just to identify if they're what you want or not.

The trouble is everyone wants to be the special snowflake discovering the special effect in their own special experiment, so now we have too much messy unreliable and inaccessible data to be of much use in the real world.


Science isn't broken... here's a bunch of ways that people cheat?

Science itself can never be broken, but when people are cheating the system that makes it effective to actually figure out what is true and what isn't, for personal gain, then that system is broken.


Science is a people-driven system of acquiring knowledge. If people can cheat the system, then the system (science) is broken.


The answer then is "propose a better one".

Unconstructive criticism is worse than non-constructive mathematical proofs, as those tend to further actual results later.


I know tons of scientists, post-docs, lab heads.

It's WAY too hard to be a scientist. The kind of salary and other sacrifices scientists are asked to make are unfair and surely deter many to leave science.

Science is the only thing moving everything forward and if the funding strategy is look for irrational people who will work insanely hard for next-to-nothing salaries...it doesn't sound like a great strategy.


Science a huge pr problem, one so bad you might as well say science is broken.

The average person is losing faith in science because of high profile failures, and the distrust is only increasing.

I fear Nasim Talib is right.

https://medium.com/incerto/the-intellectual-yet-idiot-13211e...


> Science a huge pr problem, one so bad you might as well say science is broken.

Science has a huge PR problem in that it is FAR less funded than the people with money and power whom it sometimes comes into conflict with.


Author has some interesting things to say but appears to be an apologist for anti intellectual rednecks whom are somehow some unappreciated source of cultural wisdom.

He believes trump supporters/and brexit fans are voting in their own interests as opposed to being duped to serve others, describs intellectual persuits as not proper jobs and further states

"people are perfectly entitled to rely on their own ancestral instinct and listen to their grandmothers (or Montaigne and such filtered classical knowledge) with a better track record than these policymaking goons."


I think the most damning thing to come out of the replication crisis was when they asked a bunch of scientists to place bets on whether a given paper (with p < 0.05) would replicate, and it turned out these bets were right quite often (https://fivethirtyeight.com/features/how-to-tell-good-studie...).

That shouldn't be possible! Science is supposed to be the best possible epistemological methodology, and here it is being beaten in "success rate of determining true from false" by guessing. What's immensely frustrating is that it's not a question of whether we're just not smart enough to tell true from false, we clearly have the power (since the guesses were often right) but we're not using it. Whatever "truth compass" the guessers were using should be part of the scientific process somehow. That's what is "broken".


That "truth compass" they used was reading the method section, which is far from guessing and very close to the scientific process.


It shouldn't be, though. It may be easy for a scientist to look at the method sections of two papers that had p<0.05 results and say of the first "that looks like a solid study with a reasonable conclusion and robust controls" and of the other "that's bogus".

The problem is that without this expert panel making these calls, both studies got published, were given the same credibility, and got consumed by non-experts. Peer review and the publishing process are supposed to weed out the obviously bad studies, but they're not doing so.


Scientists are familiar with the literature and are taking into account information from other papers when judging a particular one.

Individual papers are not a source of truth, they're bits of cumulative evidence hopefully leading to the truth.


Right, I'm not chalking it up to mysterious psychic powers. What I mean is, it's frustrating in how it demonstrates that our truth-sensing capabilities, and the "pure" scientific method, are actually working just fine and we don't need to have any existential panic over whether truth is unknowable even in noisy data. But those capabilities aren't being used when it comes time to actually write the paper, which is exactly when you should be applying them.


I don't understand, isn't that really encouraging, that peer review might work to weed out the truth after all?


Peer review is supposed to be done before publishing, not in a separate meta-analysis.


If a person's mental state can impact their decisions and the quality of their work, why aren't we tracking the subjective states of those conducting research? And does http://eqradio.csail.mit.edu provide a tool for doing so?


This sounds very scary and totalitarian. We have the law and the police that's the maximum level of intrusion I am willing to tolerate in terms of monitoring my personal state and behavior. Anything more than that and I am gone.


I agree. Thankfully, as human beings, we're capable of coming up with systems using this technology to solve the subjectivity problem in a responsible, ethical manner. It doesn't have to be one involving the law.


Not sure if you are being sarcastic or not. I hope you are being sarcastic because we know that more or less any such technology not only has the possibility of being abused but has been shown to be abused in practice over and over again.


And yet the technology exists.

Who would you rather use it: those who sell to develop a responsible framework for using it or those who seek to abuse it?

This tech is only going to become more accurate and more widespread over time. Ignoring it isn't an option. We've got little to no idea how to responsibly use human emotion as an input and decades worth of collective experience on how to exploit it.

The potential for abuse is high, which means extreme transparency is needed, at the very least.

Operating at an emotional level requires a whole new set of metrics and a fundamental shift in how we do anything. Better to figure it out now than to wait until the tech becomes ubiquitous. Those looking to exploit the tech most definitely won't wait for that day. A few people I've spoken to about it were eager to explore the tech purely for the sake of improving sales of various things. I changed the subject with them after realizing they had little regard for human life in this context.

People concerned about this need to band together to find ways to productively use it and defend against its misuse.

I've been working on such a framework, if anyone's interested in exploring it with me.


This would end up being counterproductive as the small minority of papers removed this way for a good reason would not make up for a vast number of scholars being alienated and distracted from their work due to the intrusive investigations. Even without that loss of productivity it's still unclear whether allocating resources to this monitoring would be cost-effective.


The purpose isn't to remove papers. It's to help guide and reproduce experiments. What if we can only reproduce certain desirable results when researchers feel a certain way?

This is too important to simply dismiss out of fear. Science needs more courage, not less.


How do you ensure the validity of such questionnaire? Sociology and psychology have enough problems without tossing a recursive one on top.

Moreover, make sure to verify the hypothesis that mental state is important to scientific reproducibility.


Science will necessarily become slightly more subjective than it is. Things have become way too objective, to the point of denying emotions play a vital part in all we do.

The validity is up to the researcher to provide & will need to be done in as emotionally "safe" way/place as possible if we don't want to risk training scientists to deny their emotions to the point of separating their biological responses from their emotions. That'd be a disaster.

This tech introduces all kinds of dangerous potential effects, so anyone working on or with it will need to take extra care.


Science isn't broken... the academic system is


> It’s no accident that every good paper includes the phrase “more study[1] is needed” ...

[1] read: FUNDING

Let's distinguish "science" -- the scientific method, the general advancement of human understanding over the centuries, etc. -- from "pork": institutionalized government funding, the establishment pursuing that, and all of the mundane, bureaucratic processes that ensue, and then the resulting hype, pettiness, recriminations, and sacrificing of ideals that it inculcates.

There is nothing wrong with science, although it may be harder these days to recognize it.

Pork, on the other hand, is approaching a singularity.


Nobody is claiming that the basic principle is not working. What is warped and bend is the pipeline, that would allow for science to proceed faster, for the results to be transferred to companies faster, for the companies to actually apply the results in buy-able products and for that revenue to feedback into science endeavors.

That machine- is broken, leaking and actually in parts moving contrary to the scientific interests of humanity.

The quality issue of science itself, could easily be remedied by replacing citations with partial repeatable experimental coverage in citations this way also ending inflation.


This article is from August 2015. Previous discussion: https://news.ycombinator.com/item?id=10085698


This should have "(2015)" in the title.


Added to title. Thanks!


The answer is clear.

Hypothesis driven experiment.

Drive it into the brains of everyone who might enter the scientific profession, and then, when someone is caught with an experiment driven hypothesis, we don't have to speculate whether or not it was fraud, because they will have known better, by simple virtue of having their credentials, and we can then safely revoke those credentials.


> Scientists who fiddle around like this — just about all of them do, Simonsohn told me — aren’t usually committing fraud

Yes they are


Glad to see journals finally omitting p-value cases for more substantiated findings.


Social science isn't broken. Math and physics are like what's p-hacking.


There are actually ways in which science can come up with the wrong answer - for example if we lived much later, the universe would be expanding so rapidly that we could not see the stars around us - the night sky would be dark. We would wrongly conclude that nothing is out there.

Sadly, what people call science these days has nothing to do with the scientific method, it's just a bunch of idiots doing correlation and thinking it's causation.


If the sky is dark, it is because there is nothing out there. That's what "nothing" means: that which provides no evidence or experience from which to demonstrate its existence.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: