Whilst the experimental subjects and data collection are inherently fraught with...

Balgair · on Feb 24, 2017

I like the idea, but who the hell is ever going to go through all of that? Yes, you made some checklist, great. But no other lab is going to go through all of that. And in your field, if you are very lucky, you may have just 1 other lab doing anything like what you are doing. It would be a checklist just for yourself/lab, so why bother recording any of it? Yes, do it, fine, but how long should you store those records that will never be seen, even by yourself? Why in god's name would you waste those hours/days just going over recordings of you watching a mouse/cell/thingy to make sure of some uncountable number of little things did/did not happen? If you need that level of detail, then you designed your experiment wrong and the results are just going to swamped in noise anyway. You are trying, then, to fish out significant results from your data, the exact wrong way to run an experiment. Just design a better trial, there is no need to generate even more confusing data that has a 1/20 chance of being significant.

ivan_gammel · on Feb 24, 2017

The checklist is not required to be on such level of detail. It just has to exist and it has to be generic enough. It's interesting to see here example with fire alarm: to me existence of such factors is the smoking gun of potential improvements to the experimental environment. Why not excluding ALL stress factors by designing something like sound-proof cage? Needs extra budget? Probably, but how about some another unaccounted noise that will ruin the experiment? This gives us an idea of better checklist: ensure that experiment provides stressless environment by eliminating sound, vibration, smells etc.

chriswarbo · on Feb 24, 2017

> who the hell is ever going to go through all of that?

It's not particularly onerous, considering the sorts of things many scientists already go through, e.g. regarding contamination, safety, reducing error, etc.

> Yes, you made some checklist, great. But no other lab is going to go through all of that. And in your field, if you are very lucky, you may have just 1 other lab doing anything like what you are doing. It would be a checklist just for yourself/lab, so why bother recording any of it?

Why bother writing any methods section? Why bother writing in lab books? I wasn't suggesting "do all of these things"; rather "these are factors which could influence the result; try controlling them if possible".

> Yes, do it, fine, but how long should you store those records that will never be seen, even by yourself?

They would be part of the published scientific record, with a DOI cited by the subsequent papers; presumably stored in the same archive as the data, and hence subject to the same storage practices. That's assuming your data is already being published to repositories for long-term archive; if not, that's a more glaring problem to fix first, not least because some funding agencies are starting to require it.

> Why in god's name would you waste those hours/days just going over recordings of you watching a mouse/cell/thingy to make sure of some uncountable number of little things did/did not happen?

I don't know what you mean by this. A checklist is something to follow as you're performing the steps. If it's being filled in afterwards, there should be a "don't know" option (which I indicated with "?") for when the answers aren't to hand.

stult · on Feb 24, 2017

I imagine it would be easy to have a git-like storage system for this information, where reproduction experiments would be a branch without the actual measurement data.

FabHK · on Feb 25, 2017

Check out Common Workflow Language (CWL),

> a specification for describing analysis workflows and tools in a way that makes them portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high performance computing (HPC) environments. CWL is designed to meet the needs of data-intensive science, such as Bioinformatics, Medical Imaging, Astronomy, Physics, and Chemistry.

http://www.commonwl.org

epistasis · on Feb 24, 2017

While this is interesting to speculate about, perhaps it would be best to start with something like the machine learning literature, where everything is already run computationally, and those in the field have the skills to easily scratch their own itch to improve the system so that it works for them.

Even in machine learning, how difficult would it be to get that field to adopt a unified experiment running system? It sounds like a huge engineering project that would have to adapt to all sorts of computational systems. All sorts of batch systems, all sorts of hadoop or hadoop like systems. And that's going to be far easier than handling wet lab stuff.

I think that the lack of something like this in ML shows that there's enough overhead that it would impede day-to-day working conditions. Or maybe it just hasn't been invented yet in the right form. There are loads and loads of workflow systems for batch computation, but I've never encountered one that I like.

In genomics, one of the more popular tools for that is called Galaxy. But even here, I would argue that the ML community is much better situated to develop and enforce use of such a system than genomics.

chriswarbo · on Feb 24, 2017

I agree that computational fields are more well-suited to spearhead such approaches, but I don't think machine learning is a good example. ML researchers are constantly pushing at the frontiers of what our current technology can do; consider that a big factor in neural networks coming back into fashion was the ability to throw GPUs at them. The choice of hardware can make a huge difference in outcomes, and some researchers are even using their own hardware (the work being done on half-precision floats comes to mind); any slight overhead will get amplified due to the massive amount of work to be computed; and so on.

Maybe a field that's less dependent on resources would be a better fit. An example I'm familiar with is work on programming languages: typechecking a new logic on some tricky examples is something that should work on basically any machine; bechmarking a compiler optimisation may be trickier to reproduce in a portable way, but as long as it's spitting out comparison charts it doesn't really matter if the speedups differ across different hardware architectures.

When the use of computers is purely an administrative thing, e.g. filling out spreadsheets, drawing figures and rendering LaTeX (e.g. for some medical study), there's no compelling reason to avoid scripting the whole thing and keeping it in git.