What if we separate collecting data from analyzing data?

analog31 · on July 27, 2022

This sounds like the "waterfall" model of project management. I could see it working in some areas and not others. In fact, a friend of mind wrote a thesis on a re-analysis of an existing data set from one of the big physics experiments, looking for a new effect that he had proposed based on theoretical work.

But I come from a background in small-lab experimental physics, and my spouse from synthetic chemistry. In both cases, a model of planning, followed by execution, followed by analysis, doesn't work, for a number of reasons. Often, an experiment fails, over and over again, until the design and operating conditions are refined to the point where the data begin to make sense. In my experiment, the equipment didn't even exist until I built it. And in experiments such as mine and my spouse's, the researcher (grad students) are also developing and refining their own abilities as they progress. I was my own electrical, mechanical, and software engineer. Sometimes, preliminary results change the direction of the project.

An additional issue is that experiments are rarely documented well enough to hand the data off to another team to analyze, without significant back-and-forth.

What little I understand about the "agile" model, seems more applicable to this kind of science.

JTBooth · on July 27, 2022

Very different operating modes are required depending on how many people you're trying to coordinate. The allegory to software development is pretty clear - sometimes you're working on a personal project and zero documentation is tolerable, but sometimes your work is going to be translated into 200 languages and distributed to a hundred million users the day after it ships, and it needs ten times as many lines of tests as lines of code.

It seems to me that our model of "collect the data and analyze it yourself" is a sort of "ten engineer startup" scale process. Now that many fields of science have four or five digits of PhDs collaborating between countries, there's an increasing need for specialization, specifically in creating reusable data. It'll make us less efficient on a small scale, but creating any artifact at all that can be reliably used by ten thousand people is much higher leverage than a fully packaged data+conclusion that's even odds to be nothing but noise.

andrewclunn · on July 27, 2022

This is true in some disciplines, but largely inapplicable in others. A great idea, but I'm not sure it would work outside of physics or chemistry. I mean imagine trying to apply that to medical research. But it's a novel idea, so thanks for sharing.

mmusc · on July 27, 2022

and the collected data should be always available