I was going to give a lecture across several departments about my PhD research in bioinformatics. The night before the talk I was generating some new figures and saw something weird, which led to me digging through source code and discovering a bug which invalidated the last 18 months of research and all my conclusions. I went to my advisor with this problem and he told me to present it anyways. I refused. I suffered consequences for that, including getting stonewalled by my advisor whenever I tried to publish a paper. I wish I had actually been in a position to refuse.
This is par for the course with computational research. I discovered a bug in my code the last week before submitting my PhD dissertation. Luckily, all of my code was organized in a pipeline that could automatically regenerate everything on the fly, but I needed a supercomputer. The queue was too long for Titan, so I set up an impromptu cluster on Azure (it was the only cloud provider with Infiniband at the time) and paid $200 out of pocket to regenerate the correct figures.
I wouldn’t be surprised if a significant portion of published computational research has bugs that totally invalidate the conclusions. I think we need to push hard to require all taxpayer-funded research to make any code that results in a journal article publicly available.
> We also noticed significant improvements in performance of RND every time we discovered and fixed a bug [...]. Getting such details right was a significant part of achieving high performance even with algorithms conceptually similar to prior work.
They call bugs 'details' which, I find, is a frightening state of mind for someone publishing an algorithm.
> They call bugs 'details' which, I find, is a frightening state of mind for someone publishing an algorithm.
It really depends on the algorithm. For example, a bug in the random number generator of a stochastic search algorithm that affects, say, the variance of a distribution won't have a relevant impact on the outcome.
That could have a huge impact. For example if it affected random draws of a hyperparameter in a Bayesian model, leading to incorrect credible intervals ultimately in the posterior distribution. Or worse, if the RNG bug was affecting the variance of randon samples in some deep component of an MCMC algorithm like NUTS or even simple Metropolis. Depending on the exact nature of the bug, it could even cause the model to violate Detailed Balance, and entirely invalidate sampling based inferences or conclusions.
Well that is why you would always run statistical validity tests on the RNG and other intermediate values whenever using a Monte Carlo model in production. Ideally the tests should run as part of a Continuous Integration workflow.
If you (dubiously) write your own RNG, sure. But you should never do that. And for library RNGs, you should execute the unit tests of the library. Frankly, running it as part of CI is at best overkill and at worst adds complexity that costs you. If you pin versions of your dependency and isolate the artifact into an in-house artifact repository, so that the library code is literally never changing unless you explicitly modify the version, then you should test it up front, then again only occasionally if you actually have evidence of a bug. And as part of any code review of the code change that introduces a version change.
Computational research is incredibly difficult because it's usually hard to see the effect of a bug. A triangle drawn on the wrong place of a screen can be easy to see, but a typo in your integration subroutine? Hard to spot if you don't catch it when it is born.
I also had a few do-overs at the end of my thesis, but fortunately had a cluster standing by...
>A triangle drawn on the wrong place of a screen can be easy to see, but a typo in your integration subroutine? Hard to spot if you don't catch it when it is born.
Well, there's also this notion of testing and regression. As I said in another comment a few days ago:
>A few weeks ago I had a conversation with a friend of mine who is wrapping up his PhD. He pointed out that not one of his colleagues is concerned whether anyone can reproduce their work. They use a home grown simulation suite which only they have access to, and is constantly being updated with the worst software practices you can think of. No one in their team believes that the tool will give the same results they did 4 years ago. The troubling part is, no one sees that as being a problem. They got their papers published, and so the SW did its job.
It's not really much different from a physical experiment, where the expectation is that you'll have to rebuild the experimental apparatus yourself to reproduce the experiment.
An independent implementation of the experiment is neccessary for a full reproduction anyways. If you just run their code again, you'll end up with all their bugs again.
(But, don't get me wrong. I like when researchers release their code. It's still very useful.)
This can't be upvoted enough. Makes me thinking that using published source code and reproducing/validating results is completely orthogonal. Maybe it's a good thing the source code for science gets not published.
I like the concept of rewriting all the code by an unbiased third party to see if they can reproduce the results, but in practice what this leads to is:
1. People will not bother. It took a lot of minds to come up with the software used (in my friend's case, several PhD's amount of work). No one is going to invest that much effort to invent their own software libraries to get it to work.
2. Even when you do write your own version of the software, there are a lot of subtleties involved in, say, computational physics. Choices you make (inadvertently) affect the convergence and accuracy. My producing a software that gives different results could mean I had a bug. It could mean they did. It could mean we both did. Until both our codes are in the open, no one can know.
It is very unlikely that you'll have a case of one group's software giving one result and everyone else's giving another. More like everyone else giving different results.
This issue comes up often on HN, and it used to a lot in /r/science (maybe it still does - I left that subreddit years ago).
If there's one thing I could convey to the world from what I learned from my time in academia, it is this: Most scientists at universities do not care about reproducibility.[1] Not only that, many people intentionally omit details from papers so that it is hard for rivals to reproduce their work - they want the edge so they can publish without competition. This isn't a shadowy conspiracy theory - this is what advisors openly tell their students. Search around on HN and reddit and you'll see people saying it.
[1] My experience is in condensed matter physics - it may not apply to all of academia.
People doing science programming are the worst programmers in the world. The reason is that they are focused on a calculation and result, not the program. I helped a guy speed up his program once. He was sorting around 10^4 to 10^5 elements using a bubble sort (which he had reinvented).
True, but it does provide at least some measure of reproducibility. Quality of implementation and reproducibility are orthogonal and both very valuable in their own right.
Yes, but closed source helps ensure that low quality code is hidden from sight. It also means that people who distrust or doubt the conclusions have no chance to identify any bug(s) and disprove the results or conclusions.
We stop publishing in papers, and instead adopt smaller chunks of our work as the core publishing units.
Each figure should be an individually published entity which contains the entire computational pipeline.
Figures are our observations on which we apply logic/philosophy/whatyouwannacallit.
Publishing them alongside their relevant code makes the process transparent, reproducible and individually reviewable,
as it should be.
We can then "publish" comments, observations, conclusions etc on those Figures as a separate thing.
Now the logic of the conclusions can be reviewed separately from the statistics and code of the figure.
A comparable solution would be for all involved to value all research, not just the ground breaking, earth shattering type.
As it is, research that yields a "failure" is buried. That means wheels are being reinvented and re-failed. That means there's no opportunity to compare similar "failures", be inspired, and come up with the magic that others overlooked.
Unfortunately, I would imagine, even if you can get researchers to agree to this the lawyers are going to have a shit fit. Imagine Google using an IBM "failure" for something truly innovative.
> Each figure should be an individually published entity which contains the entire computational pipeline.
I agree in principle. But, for the experimental sciences, we need better publication infrastructure to make this practically possible.
For example, consider a figure that summarizes compares, between several groups, the mechanical strain of tensile test specimens for a given load. Strain is measured from digital image correlation of video of the test. Some pain points:
1. There is a few hundred GB of test video underlying the figure. Where should the author put this where it will remain publicly accessible for the useful lifetime of the paper? How long should it remain accessible, anyway? The scientific record is ostensibly permanent, but relying on authors to personally maintain cloud hosting accounts for data distribution will seldom provide more than a couple years' of data availability.
2. Open data hosts that aim for permanent archival of scientific data do exist (e.g., the Open Science Framework), but their infrastructure is a poor match with reproducible practices. I haven't found an open data host that both accepts uploads via git + git annex or git + git LFS and has permissive repository size limits. Often the provided file upload tool can't even handle folders, requiring all files to be uploaded individually. Publishing open data usually requires reorganizing it to according to the data host's worldview or publishing a subset of the data, which breaks the existing computational analysis pipeline.
3. Proprietary software was used in the analysis pipeline. The particular version of the software that was used is no longer sold. It's unclear how someone without the software license would reproduce the analysis.
Finally, there's the issue of computational literacy of scientists. In most cases, the "computational pipeline" is a grad student clicking through a GUI a couple hundred times, and occasionally copying the results into an MS Office document for publication. No version control. Generally, an interactive analysis session cannot be stored and reproduced later. How do we change this? Can we make version control (including of large binary files) user-friendly enough that non-programmers will use it? And make it easy to update Word / PowerPoint documents from the data analysis pipeline instead of relying on copy & paste?
If any of these pain points are in fact solved and my information is out of date, I would be thrilled to hear it.
I can’t speak for GP, but Nelson invented hypermedia/hyperlinks and had a vision for the future that included documents including other documents. All of that seems pretty compatible.
Of course it won’t ensure anything, but currently being completely unable to reproduce results, even as the author but just a year from now, is par for the course.
science can never prove anything as a matter of principle. it can only disprove all the alternatives. math and logic can prove, but only within the model it has built up, which has been shown to contain unprovable axioms that one must simply accept.
Little of what I do, even with the most rigorous methods available and the best practices from both software development and computational science, proves anything.
I more mean there are whole aspects of science that aren't provable without being able to actually obtain counterfactuals, and that means time machines
This is why I'm a bit skeptical about the global warming predictions. AFAIK they're all based on multi-million lines of code models. Not only it's monumentally hard to have such large code base without bugs, but also, in case of a software implementing scientific models, finding bugs is extra hard (compared to software which works in a visible way and has millions of users, such as Linux kernel, games, car's embedded software etc.). An effort to have such model verified (let's say to a NASA standard) to be bug-free would probably cost billions of dollars. And that's only bugs, let's not forget that all the models are approximations, plus the numerical methods used all have their quirks and limitations etc. All in all, the problem seems too hard to tackle for humanity right now.
While reasonable skepticism is healthy, global warming is such a well sudied phenomenon by now that an unreasonable number of independent codebases must have identical bugs in order for it to be false.
There's also the fact that we have had a pretty solid grasp about the chemical reactions of greenhouse gases since long before computers, both theoretically and empirically, and we know roughly how much is put out in the atmosphere.
Where the models diverge is on far finer points than what is needed to make the basic policy changes that seems to be where we are stuck right now.
> an unreasonable number of independent codebases must have identical bugs
Entertaining[0] badpun’s skepticism, it is not necessary they have identical bugs, only that their bugs yield similarly biased results.
For example, if a significant number of bugs are identified by their affect on the results, then bugs contributing to “wrong” results might be more likely to be identified and fixed.
Not if the bugs are actually just bad/corrupted data. For example, the main dataset the IPCC is based on appears to contain all sorts of bad data that definitely make me skeptical of the conclusions the IPCC comes to (https://researchonline.jcu.edu.au/52041/).
If global warming is actually wrong, it's most likely because of bad/corrupt data in the datasets used.
The counterpoint is that we have already seen a steady temperature rise. So even though the specifics of the various simulations might not play out as predicted we can expect temperatures to rise.
Knowing that the temperature will rise is not enough to make policy decisions though. You need to predict the increase's magnitude, as well as practical consequences, such as climate change, how much the sea level will rise etc. And for that you need resonable and bug-free models.
this will be the least of our problems. we understand very little about nature to have any reasonable prediction model. we don't know the inflection point which will cause massive collapse of major ecosystems we depend on.
all we know is that things are changing fast. faster than many non-human organisms are able to adapt.
How would the global warming predictions all be biaised in the same way ? _All_ the studies are measuring the same tendency : the temperature is rising. The model does not need to be _absolutely_ precise to be right.
> _All_ the studies are measuring the same tendency : the temperature is rising.
Are you talking about predictions (and not measurements, you don't need a model to measure temperature)? Assuming you are, there's unfortunately a huge problem with modeling (and heavy math and stats-based science in general), in that researchers tend to stop looking for bugs in the model when it returns the results that they expect. In other words, if a bug in the model tells the researcher that Earth's temperature will decrease by 4 C by 2100, he will look over the model until he finds the bug, but if the model tells him that the temperature will increase by 2 C, thus confirming his inner bias, he'll declare it correct and move on to writing a paper based on the "finding".
Alternatively, as a thought experiment, imagine if math research were done in the way climate science is done. We would have proofs that are millions of pages long and were never verified by anyone. We would trust in them only because the author says that they are correct. Is this science?
A given prediction can be wrong, an experiment may be biaised, my point is that you choose to ignore that the vast majority of the experiments and measurements point in the same direction.
>>Researchers tend to stop looking for bugs in the model when it returns the results that they expect.
CO2 produces a greenhouse effect by absorbing IR.
This alone doesn't prove that Industrial produced CO2 alone is this time mainly responsible for climate change. All the other times' science believes the climate changed because of sun intensity.
> All the other times' science believes the climate changed because of sun intensity.
False. Lots of past climate changes were due to changes on Earth and its atmosphere (and sometimes, specifically, life on Earth), not changes in solar output (e.g., notably, the Huronian glaciation believed to have resulted from the Great Oxygenation Event, which resulted from the exponential growth of photosynthetic life.)
Solar intensity has increased slightly over the last few billion years, but previous changes in climate have been driven primarily by Milankovich cycles, volcanic emissions, and plate tectonics.
That the post-industrial rise in atmospheric CO2 is of anthropogenic origin is hopefully not a point of dispute, but it is demonstrable if necessary. Thus it remains to show that this must raise the equilibrium temperature. So, as you say, CO2 selectively absorbs outgoing IR. In the lower atmosphere, this actually does not have as much of an effect as you might think. Water vapor blocks quite a bit of the absorption spectrum, and the effect of CO2 is more-or-less saturated already.
The mean free path of an outgoing IR photon in the lower atmosphere is quite short. Absorbed photons are re-emitted in a random direction, but take an overall upward course, the mean free path rising with altitude. At the (radiative) top-of-atmosphere, the mean free path is infinite: the photon is more likely to leave Earth. At the edge of space, there is essentially no water vapor, so the action of CO2 is greater. The effect of increasing the amount of CO2 in the atmosphere is to push the CO2-dense region of the atmosphere further out into space. Photons must take a longer path out of the atmosphere, and this must raise the overall temperature of the Earth proportionally, specifically by 3.7 W/m^2 per doubling of CO2, which is commonly held to be equivalent to 1 degree C of global temperature. This must be the case unless our understanding of thermodynamics is very wrong (and if you have an issue with thermodynamics then you have some pretty serious issues).
So, one degree C ain't so bad, right? Well, it wouldn't be if that were it. However, there are several problematic feedbacks. One is that melting a lot of ice lowers the Earth's albedo, which causes it to absorb more heat. Another issue is that there is a lot of this "water" stuff around, which is very readily absorbed by the atmosphere, in a manner that increases very sharply with temperature. Water vapor is a much better greenhouse gas than CO2 by all accounts.
Climate science is not an extrapolation from the temperature record. There is a solid minimum bound on the temperature effects of doubling atmospheric CO2, and a variety of amplifying positive feedback effects. So far, in the last twelve decades, we have not managed to find anything which would reduce those effects to something manageable. At this point, the effect would need to be both very large, in order to offset the strong H2O feedbacks, and very small, to not have been noticed. The most plausible option would be "something poorly understood about the H2O feedbacks". I believe the most successful of such theories would be Dr. Richard Lindzen's Iris Hypothesis, which has generally failed to find support. At this point, there are no particularly plausible mechanisms which would transfer this extra energy to space, and if those did exist, then they would not necessarily be a non-issue: even if thermodynamics and optics are entirely wrong, the planet is warming, and we will have to deal with that even if it can't be prevented.
If you have any other questions, or would like citations for any of the above, do feel free to ask.
Plate tectonics and volcanic activity have also influenced climate in the past, e.g. the closing of the Panamanian isthmus, or the formation of the Deccan Traps.
Interestingly, the original paper proposing AGW (in 1896) was actually intended to explain Ice Ages:
"_All_ the studies are measuring the same tendency : the temperature is rising"
There could be lots of reasons for that. Anyway, the temperatures were not always rising, clear temperatures rise was observed in 1930-40s and in 1980-90s. Cooling in 1960-1970s. And yes, prediction has to be precise. If a model predicts rise of 3K in 100years, and you measure 1.3K - your model is wrong. It's even more wrong if you don't take into account any of the natural cycles, even if prediction is accidentally correct.
He said he's skeptical, not that he thinks it's wrong. I've got to say that this kind of rabid response from climate change proponents kinda prods my inner contrarian.
I think at this point the group think is so strong that if anyone came out with evidence against global warming they would commit career suicide by publishing it
Automatically reducing an argument to saying someone is equivalent to an anti-vaxxer kind of sidesteps the issue and is a logical fallacy. If there is a problem with the argument, elucidate the issues, but "Reductio ad X" arguments are not valid. This is a large and nuanced issue, which requires more thought and argumentation than can usually be contained in a HN comment, but I think the reduced thing we have here is whether computer code predicting a certain outcome should be trusted -- and, more importantly, making major decisions based on that prediction, given the nature of bugs and problems we see in normal code. I believe we all agree that potential consequences are far worse, we simply disagree on how to treat those consequences and where they come from. Because there is so much at stake, it is best to be more sure of what is happening and to take the right choice, instead of the first one we thought was correct based on a limited, and potentially flawed, computer model. If the computer model predicts dire catastrophe, then we should take it seriously and do another one with even greater resources in order to ensure the prediction is correct and what course of action should be taken. Perhaps it is right in predicting catastrophe, but not in predicting the -right- catastrophe, in that case we could spend a large amount of resources on the wrong solution and miss the right solution, which could end up being even more catastrophic.
If he has some actual basis for an opinion that contradicts the large majority of scientific consensus, then he should have at it. But to reject science out of hand with no reasoning other than "Well, there could be bugs" is just insane.
> So you'd rather just ignore all scientific results until somebody does a NASA-level code review because there might be bugs?
The bugs, if serious enough, can make it not scientific, in the same way that a paper that has grave errors in it is wrong and thus obviously not scientfic. So, before we do a thorough review of the code, we should treat it with due scepticism.
EDIT: this can also be expressed in terms of risk analysis. For typical software, the consequence of a bug is low - most software is commercial, so the impact of the bug will be limited to that company's bottom line (with exceptions of software that can kill, but these people already are serious about bugs). Also, most bugs in most software are either highly evident (the button does not work, you get random segfaults etc.) or have limited impact.
On the other hand, the bugs in climate change models, given their "pipeline" nature (wrong result from one module is propagated downstream all the way to the final prediction of expected temperature change), can quite often have severe impact. They can also be not evident at all - they can for example change the final outcome predicition by 1 C.
Compound that with the fact these predictions are used to make trillion dollar decisions on global policies, and you can see that the actual damage done by bugs is not unlikely to be in the trillions. And that's why I say it's probably wise to subject the models to extreme scrutiny.
There's enough proof that we're destroying our environment on a grand scale - literally the only place that we can survive as a species. All of what you said is just rationalization for the kind of behavior that has gotten us to this point and will continue far into the future. It's easier to deny anything is wrong than it is to do anything about the problem. And I'm not at all surprised by how many climate skeptics there are here - intellectuals and really fucking smart people who are incredibly good at rationalizing their behaviors and beliefs to avoid the feeling of having to do anything and who think they're able to think about the subject matter more clearly and productively than the scientists who work on it day in and day out.
I spent a couple months implementing a promising optimization technique that was published in Nature. I assumed that I was missing something or had a bug in my code because I could not reproduce any of the results. It turns out, one guy in my research group knew the author. He requested the source code for me to the program used to generate the data for publication. That program did not even come close to doing what the paper claimed due to some very serious bugs. When I brought up these issues, nobody really seemed to care much. It wouldn't be worth anyone's time to try to correct the issue or write to the journal.
> It wouldn't be worth anyone's time to try to correct the issue or write to the journal.
I don't see how a research scientist could say this; something seems culturally wrong there. Nature is a pretty serious publication. You spent months on it, maybe someone else will. How can it not be worth your time to 'try to correct the issue or write to the journal'?
-Nature is full of bogus, like all journals, if not more. The reason for its high impact factor isn't due to consistently solid papers all around, it's due to a few very high impact, foundational papers and a crowd of minor impact/shaky ones that were sneaked in by big shots and friends of the editors. Look up retraction rates by journal and prepare to be surprised.
-I don't think you realize the step it takes to actually write to Nature and say: "Excuse me, that paper you published is wrong and you should retract it." You're liable to alienate all of the authors as well as piss off the editorial board for pointing out that they let a mistake slip through. You should be prepared to face vigorous backlash, and be completely confident in your own results if you don't want the repercussions to overwhelm you. Many communities related to specific fields are small and niche, so making enemies with one team often means that you've actually cut yourself off a good part of that community. You should be prepared for awkward moments in conferences when you meet each other, incendiary questions when giving talks and scathing "anonymous" peer-reviews for your future submitted work. It's far easier to assume good faith, give the authors the benefit of the doubt ("yeah the implementation is buggy but what software isn't? The main idea is probably sound") and move on.
-I don't know about other fields, but in mine, there's an unofficial accepted consensus on a set of very high-profile papers that happen to be either complete bunk or utterly useless. Again, we're talking about Science, Nature, etc. These papers made a lot of noise when they came out years ago, and now no one in their right mind would base their work on them. Again, you can't just knock on Science's editorial board and go "Excuse me, that hyped paper from a very big shot is useless and you shouldn't have published it", so it's just something people in the community know and whisper among themselves. I guess it may seem strange to outsiders though.
> I guess it may seem strange to outsiders though.
Of course it seems strange. The output of the scientific/academic community is often presented to people as being a source of truth and scientists do little to dissuade this. To learn that it's all rotten to the core is upsetting.
I don't think it is all rotten to the core. I think it is a human establishment trying to do something very complicated with varying results. Sometimes they get things right and sometimes they make mistakes and sometimes they are infiltrated by bad actors.
What I try to do when reading academic papers is, if there are multiple on the same topic, read them and try to understand the differences. If different papers come to the same conclusion, that should make you a lot more confident in the result. If there aren't many papers on the subject then you just need to understand that this is kind of a "best effort" thing that is more likely to be true than a random guess, but not certainly accurate.
It's not rotten to the core. It's a complex system, full of human beings, with egos and incentives and other reasons for doing things.
Different labs also do different things - mine is currently reworking a paper that was basically done because something we did might matter, and my students have developing tests as part of their workload for developing code.
It's not rotten to the core. There are good people in there, but good people are not rewarded, and advancement selects for some bad traits. Maybe that's scarier than it being rotten to the core.
It's not rotten to the core, it's just imperfect, like all human communities are. Nepotism, conflicts of interests, human error, unconscious biases, etc. don't magically disappear just because we're in academia. As I said below: "All in all, I'd say we're doing fine. We're just not the ethereal source of truth that some people hold us to be, the very same people who, after claming that 'God is dead', are very quick to replace Him with His sillicon-based equivalent around which we would act like priests, except with lab coats in lieu of clerical garments."
It's all about trust, really. Many people choose to trust state-of-the-art medical research when they do vaccinations on their kids; some do not. You can indeed choose not to trust us as a 'source of truth', and it seems a fair chunk of the US political landscape is being led down that path; we'll see where it goes.
If the most prominent repositories of research not only do nothing about but actively resist and retaliate against substantiated challenges to published results, that sounds to me like something pretty rotten, at or near the core.
The core principles of science are in conflict with natural human behavior (which is one reason it took so long to invent science), so saying “they’re only human” is really no excuse.
I think it's one of those things you just have to learn to live with :/
And believe me I hate being defeatist, it made me a lot angrier when I was younger and I consider it to be one of the things that stopped me trying to go down the academic route so far.
While coders (who aren't just being boosters for particular technologies) have to live with the fact that the world is run on mountains of bad code, scientists of anyone engaging with academia or journal papers has to sleep at night realising how much bad science is being done and how much the publication process is skewed to publishing and politics over quality or fact checking.
I can no more fix it than I can fix all software bugs or politics in big corporations...
We never claimed to be completely free of bias, or immune to nepotism. We're human.
I think you're making it to be a bigger deal than it actually is. Most of these things are just noise. We just ignore it. If someone rises to prominence with outrageously fake results, rest assured that they will get shot down quickly by concurrents (all the incentive to publish fake results lies within highly competitive fields). If you know what you're doing, and work in a lab where people know what they're doing too, you can still do pretty good science. I would guess than in all communities, academic or not, there's a nonzero amount of bogus, over-hype, nepotism, politics and what not, and a subset of people who actually get things done and make the whole field advance. After a while, the test of time truly determines what was actually useful from what is bunk. That's how it's always been.
The role of science in society is to increase the sum of knowledge on which progress can be made. If a scientist knowingly lets false information be published he is causing a disservice to society, harming progress.
Scientific papers are made public for a reason. The correct information should not be accessible only to a small circle of people in the know. If incorrect papers are out there, those who can correct them have a duty to do so.
>If a scientist knowingly lets false information be published he is causing a disservice to society, harming progress.
Often it's not so much false as empty or useless. Nevertheless, since all people in a community know the values of individual papers, and these are the same people 'making progress' in that field, I don't see how science is so much harmed. I agree the situation is kinda ridiculous, but it's not a "sky is falling" situation either.
>If incorrect papers are out there, those who can correct them have a duty to do so.
Sure. How much are you willing to pay me for me to prioritize this 'duty' over the literal hundreds of duties I already have at my lab?
> You're liable to alienate all of the authors as well as piss off the editorial board for pointing out that they let a mistake slip through.
Better rewards for uncovering bad science keeps it from devolving into a political contest. If you're not seeing the right incentives in place, you're either missing something or you're seeing an opportunity to capture some value by implementing those incentives.
This article is full of opaque generalizations about human behavior that more accurately describes the baser impulses of individuals attempting to game systems they don't understand (participants in psychological experiments) and don't so much reflect the actions of self-conscious professionals with any semblance of dedication to their fields.
Curiosity killed the cat and all that, but what is your field? I had a look at your profile, comments and submissions (hopefully that doesn't sound creepy!) and I can't tell.
It's Saturday morning (and I got students' papers to mark) :)
>I don't see how a research scientist could say this; something seems culturally wrong there.
I can tell you haven't spent time as a grad student in a top university in the US. There is almost no incentive for a research scientist to pursue this. They're very busy and stressed out, and this will not help them in any way. Your argument that one person wasted a lot of time, and so others will be spared that pointless effort is a sound one, but you have to realize that in some (sub)disciplines, scientists view a good bulk of the research as problematic and a waste of time. Why go through the trouble for this particular case?
Also, I doubt a simple email to Nature will change much. There would likely be a somewhat lengthy process, which will suck more of your time. And to be brutally honest, the chances are higher that the code did produce those results, but the grad students/post docs have been modifying the code for their next batch of research. Even something as basic as version control is unheard of in much of scientific research.
Definitely a cultural problem, as you describe it.
Grad students / postdocs / human lab rats aren't scum, the incentives just aren't in place to promote good behavior (such as calling other researchers out on their bullshit). If you're trying to acquire a vaunted tenure track job, you can't afford to piss off $senior_tenured_researcher_at_prestigious_institution, since $senior could blacklist you so that you won't get hired at the incredibly small set of universities out there. Sometimes things work out despite pissing off major powers (Carl Sagan technically had to "settle" for Cornell due to being denied tenure at Harvard, in no small part because of a bad recommendation letter from Harold Urey [0]), but not often.
Even if you do manage to get a tenure track job, you pretty much have to keep your head down for 7 years in order to secure your position.
And once you have tenure, you still get attacked vociferously. Look at what happened when Andrew Gelman rightly pointed out that Susan Fiske (and other social psychologists) have been abusing statistics for years. Rather than a hearty "congratulations", he was called a "methodological terrorist" and a great humdrum came about [1].
When framed against these circumstances, it should be evident that there is literally nothing to gain and everything to lose from sending out a short e-mail pointing out that someone's model doesn't work.
I'm a researcher myself and I guess this is one of those "does the end justify the means?" scenarios... Out bad research and its perpetrators and science loses out on a scientist that actually wants to do good work. Or don't and then watch yourself rationalize worse decisions later on for the sake of your research, slowly becoming as corrupt as they were and realizing that a lot of your cited work could potentially be as bad (or worse) as the ones you helped get published.
I really believe we need a better way. Privately funded / bootstrapped OPEN research comes to mind as a potential solution to bring some healthy competition to this potentially corrupt system. Mathematicians are starting to do this, I think computational researchers have the potential to be next.
> Grad students / postdocs / human lab rats aren't scum, the incentives just aren't in place to promote good behavior
The question is, would additional incentives promote good behavior or just lead to more measurement dysfunction. Some people think that just giving the "right" incentives is needed, but actual research shows otherwise.
Without reading through that very long text, claiming that incentives don't influence human behavior is a wildly exotic claim.
There is near infinite evidence to the contrary. That said, constructing a system with "the right incentives" can of course be devilishly hard or even impossible.
The claim is that it does change behavior, but only temporarily and it doesn't change the culture in a positive way / doesn't motivate people. It ends up feeling like a way of manipulating. That being said, according to this article, the entire incentive system would need to be dismantled. Simply adding more incentives wouldn't necessarily produce higher quality, at least not in the long run. So essentially the process of incentivizing new amazing research for funding is the primary issue and adding incentives for pointing out issues would just be a bandaid.
This sounds like a good critique of naive incentive schemes.
I don't think there is any doubt that humans follow incentives.
But working out what the core incentive problems are, and actually changing them might be both (1) intellectually difficult, and (2) challenge some sacred beliefs and strong power structures, thus making it practically impossible.
The HBR article's discussion of incentives is not really quite what I was thinking of when I wrote my comment. Specifically, the article you cite refers to the well-known phenomenon of how introducing extrinsic rewards via positive reinforcement is counterproductive in the long run. I've often noticed this form of "incentive" / reward being offered in the gamification of open science, such as via the Mozilla Open Science Badges [0], which in my opinion are a waste of time, effort, and money that do little to address systemic problems with scientific publishing.
With regard to the issue of grad students being unwilling to come forward and report mistakes, incentives wouldn't be added, but rather positive punishment [1] would be removed, which would then allow rewards for intrinsically motivated [2] actions.
It's not at all uncommon that implementations provided with papers do not actually do what the paper says it does. You're often lucky when there is an implementation at all. But most of the time it's just that running the exact same implementation under the same conditions requires setting up a very specific environment, installing specific versions of libraries, using some niche software and converting data from one byzantine format to another. Each deviation from the original paper is liable to subtly affect the results in nondeterministic ways. That's why no one really gets surprised or even cares, life is too short to call out all the bad software written in academia.
I was more referring to changes in the API where the input and output suddenly have to be in different formats in the middle of a pipeline, causing a crash. What can also happen is that somehow the old format is still valid and gets processed all the same, thus yielding nonsensical results. Sometimes a lab devises their own format which no one else uses, and the specification may be updated without notice between the moment they publish and the moment you try things out. Most people have no idea about things like 'backwards compatibility', 'unit tests', 'containers', etc. Code is just a tool to them and the fact that they had to write some is annoying them in itself.
> requires setting up a very specific environment, installing specific versions of libraries, using some niche software and converting data from one byzantine format to another
Containerisation is fairly mature and simple to use. Many in other fields struggle with these exact same issues and are able to create reproducible environments just fine.
I find it amazing that those publishing don't include their implementation, all that work locked away on a rusty hdd.
Do you think that they would use a VCS that was less invasive and more transparent to their workflow, like dropbox?
I'm thinking of making a VCS that simply runs in the background and
- Automatically records every file save (effectively a git commit without a message)
- Allows adding messages through tagging (like git tag)
- Handles 'branching' just by asking you make a copy of the directory with a different name, properly understands how to diff/merge/etc copied files/directories that have since diverged.
In the software dev world--but outside of BSD--I would argue that containerization is extremely immature. Outside the software dev world, it is not easy to use.
Based on my experience on Academia Stack Exchange, in a parallel universe there is a student complaining that their advisor made them rework everything the night before a presentation.
Candidly, I probably would have told you to present it as is, and add a caveat to the last slide that this was work in progress (most internal presentations are assumed to be so) and you're still chasing down some problems in your code. The reason for this is two-fold:
1) It's the night before. Many students I have known and mentored aren't at the point in their career that they can "wing" a major presentation. It would be setting them up to fail in a way I couldn't shield them from (I can deal with changing results, but a bad presentation is largely on the student).
2) The quality of your checking is likely to be poor the night before. There's a number of times I've found an error as something was being prepped for presentation/poster printing/etc. and been convinced it changed everything, only to discover after 48 hours of thought and more checking that the difference in results was pretty negligible - especially in the sense of the qualitative take aways from a presentation.
This, of course, may not have been your exchange with your PI. But I thought it was worthwhile that there are reasons not to have you change everything the night before that aren't the result of villainous fraud.
This seems like a great neutral path forward that shouldn't upset anyone and doesn't paint the research as 100% factual and in-stone, yet. It leaves the door open to continue the work while rectifying the issues. However, it doesn't excuse the way the advisor asked them to just go with it anyway unless there was some nuance left out in that part of the conversation (akin to you what you've stated) we're not hearing.
Was it an option to present the paper as normal, but then have a sudden surprise twist at the end which left the audience both surprised and feeling the pathos of imagining how they would feel about discovering the last 18 months of their research was invalid?
Or am I mistaken in reading "present" as "stand up in front of a crowd of people"?
Just thought the same. There was a way to "follow orders" and satisfy your principles in a way that serves science. And if anything happened to him after that, it would be clear that it was a political firing.
This has me really curious about the state of software engineering principles in academia if you can find bugs that invalidate 18 months of work.
I'm not saying it should have been preventable. It just looks like there may be opportunity to improve practices.
Admittedly I'm completely naive to the domain. Are there no forms of validation checkpoints you can reach where your foundations are rock solid and well backed with tests and such?
I had a Prof in grad school who lost years of cryospheric research data because an external hard drive was stolen. This was in 2010. It was a head scratcher, especially being faculty-adjacent to some of the best CS and engineering faculties in Canada.
I've seldomly seen a PhD student in CS here in Germany with a proper software engineering background. I know several that use Dropbox for version control on their code and don't know how to use git. Some at least know and use svn.
That stuff just isn't thought at universities and it's assumed, like oh so many things, that you take it up along the way.
This might sound harsh, but that is something you have to deal with in life (academia or non-academia). What you were tasked to do is present your work so other people made room im their calendars to listen to your talk. Finding an error is no reason/excuse not to present your work. BUT you have to make it clear upfront that there is a problem which might invalidate the results. And maybe already give an estimate how far reaching this error is.
If you have been working 18 months on a topic you will have substantial knowledge you can communicate to your peers and often some work inspires other work even if they are not using your findings.
You could almost say we mostly learn from failures. If you had blind success with a complex method the only way to test the limits would be tweaking it to failure.
That's an incredibly eloquent way to put it. Even though it's how I live most of my life I was never smart enough to express it like you did. Thank you.
Why did you not try to publish those problematic results without your adviser's name on? There are venues focused just on negative/non-reproducible/problematic results.
First of all, a lot of people writing it are inexperienced in SW best practices. Then, turning formulas into code is hard, not to mention some article "handwavings" like showing pseudocode that's almost ok except for "do hard thing in this line" and that line expands into a lot of code.
I've also had some weird bugs in code like that (but nothing that would invalidate 18mo of results - btw did you know RAND_MAX is as low as 64k in different compilers, like some older versions of VS?)
(And computation times, though the cloud helps a lot with this)
Do grad students have any leverage over their advisors? Surely there must be some incentive to root out professors that degrade the reputation of the school. Public shame? Accusations of fraud? There are more options here than students admit, and some other school would have loved to have you.
Part of the thing to keep in mind is that things like accusations of fraud are potentially career ending.
People, including professors, understandably react poorly to other people trying to end their careers. It's important to recognize that something like that is coming out swinging.
Also, in my experience, there's a tendency with many graduate students to conflate "unethical" with "I don't like this". Not saying that's true in this case, but an "incentive to root out professors" is likely going to result in some pretty strong undesirable outcomes as well.
My mentees failing to progress will show up on my annual evaluation, and will certainly be a factor in my tenure case (especially since my position has no other aspect for teaching).
It's also something the project officers on my grants watch.
It also means I lose whatever I've invested in that student.
One does not idly destroy their graduate students, regardless of what HN occasionally thinks.
No, graduate students in general have no leverage over their advisors. A single word from your advisor and you're out of the field.
There is little incentive to root out professors for any reason. The process of becoming a professor (grad school -> postdoc (N times) -> tenure track faculty -> tenured professor) is generally believed (by tenured professors, of course) to root out anyone unworthy of the position. You can believe what you want about the efficiency of such a process.
Public shame requires public understanding of scientific (mal)practice, so, good luck communicating that. Most of the time, the bad actors in question have already gotten papers past referees; what makes you think the public is capable of more thorough review?
Fraud is considered a serious allegation and as a result accusing someone of it requires going through a thorough process involving a host of university administrators, whose incentives are aligned with the profit motives of the university system.
Transferring graduate schools is essentially impossible, and even in the exceptionally rare circumstances that it happens, it always involves burnt bridges and often has to do with bigger fish (i.e. your advisor being offered a position elsewhere, and you're lucky enough they take you with them.) Without external funding to support you, you are usually replaceable. All graduate departments receive applications far in excess of the number of students they can support. They certainly will not consider taking on another from a school at which you've proven to be a problem. Academia has already established a quite successful leaky pipeline; the beginning (graduate school) is no different.
In academia, hierarchy is the rule, flat organizations the exception. You must purchase your influence, usually at significant cost (and luck is a significant component). As an undergraduate, the system is designed to cater to your interests; as a graduate student, you cater to the university's interests. Scientific integrity is a noble notion, and in some corners of academia, it survives, but it does so in spite of bad actors who thrive in a system designed to produce ten times the number of qualified applicants for each job, all of whom are judged according to easily gamed metrics. It would be nice if things weren't this way.
But the problem is, typically... if you decide to get a PhD in science, it is possible that you're already too obsessed with the subject to ever, truly, give it up, especially if it's "just" over working conditions. I can't speak for everyone, but most people leave because they were forced to.
There really needs to be repercussions for this type of thing. Naming and shaming is a good first step towards that utopia. If those responsible for these types of unethical actions never get punished they'll just keep doing it to the detriment of society.