> The calls for research to be transparent and reproducible have never been louder. But today's tools for reproducible research can be intimidating - especially if you're not a coder.
As someone (a software engineer) who has been trying (struggling) to reproduce biology research lately, I say amen. Hallelujah.
But. It's time to accept coding as a core skill. Science has more to learn from software engineering than it realizes. Software engineering (aka coding) eats reproducibility for breakfast, even when hundreds or thousands of "collaborators" are involved. These days, it's rare for a single biology researcher to produce (publish) code that is easily reproducible by an external researcher.
It's a bit more complex than that, as it's not just having access to the software, but to the whole environment. The move towards reproducible builds and configurations (Nix and the like) is a good thing for this, as well.
I've come across this problem before (as a third party - I don't do research) and I've considered writing software to help solve it. Maybe something like a Jupyter notebook with an attached mini-filesystem that can be easily shared with colleagues? What do you think would be a required to solve this?
If you're really interested in this, you should contact Ivo Jimenez and Felix Z. Hoffmann. Although there's not much information available, under the heading of "
Reproducible Computational Research – a case study
" at [0], they tentatively explored such an idea at a hackathon in Cambridge in May.
It depends a lot on the field. In my own work (I'm out of academia now, but it still applies), there's often very complex network setups involved, so a Jupyter notebook won't quite cut it. Many things in this particular domain can be nowadays solved with SDN, but it is still a complicated issue. I'm sure people from other fields have analogous problems, too. And this is in CS, I suspect that anything which implies more "physical" setup is orders of magnitude more complex.
I'm baffled how "coding" equates to software engineering. If anything, scientists probably need to listen less to current Research Software Engineering dogma, and RSE types need to learn more about science. (Few will even take measurement seriously in my experience.)
The "reproducibility" mantra is at odds with a lot of real world science and serious computing. You don't/can't in general reproduce the sort of physics experiments and large-scale calculations with which I'm most familiar the way people are suggesting, and software engineering can't address bad science or lack of information. Revision control and "notebook" interfaces seem to have become the equivalent of waving XML metadata at any problem from the days when e-science was preventing useful work and research. Experience from a non-trivial research record and some decades doing and supporting research computing will be ignored, though.
The issue is not to repeat the experiments, but rather to avoid putting the logs in the wheels of people who want to re-analyze your existing data (including yourself a couple of years later) and avoid losing thousands of working hours doing forensic data analysis to point out shitty science. Keith's Baggerly 2010 talk on the hoops he had to jump through to get to Anil Potti (https://en.wikipedia.org/wiki/Anil_Potti) is a great demo of the application case: https://youtu.be/7gYIs7uYbMo.
And as for "doing real science" vs. trying to make it more reproducible, there is an excellent analogy with "doing real programming" (aka adding features) vs. refactoring and architectural adjustments. Telling that you consider the second as a waste of time tells more about yourself than about the subject.
As someone (a software engineer) who has been trying (struggling) to reproduce biology research lately, I say amen. Hallelujah.
But. It's time to accept coding as a core skill. Science has more to learn from software engineering than it realizes. Software engineering (aka coding) eats reproducibility for breakfast, even when hundreds or thousands of "collaborators" are involved. These days, it's rare for a single biology researcher to produce (publish) code that is easily reproducible by an external researcher.