Thanks for this generous comment. The author of TFA has articulated a genuine problem that is central to many large-scale investigations these days, across many domains. We rely a lot on complex computer simulations, or complex physics-based models, that have a lot of fiddly details that are understood by only a limited set of people.
Yet, we want to learn from these models, and we want to reach conclusions from them. This has turned into a key problem for the scientific enterprise.
There are so many linked issues, some technical, some philosophical: Mere Monte Carlo state exploration is wasteful and doesn't provide much insight. Often we don't have error bars on model outputs to even know if an "improvement" in a metric is significant. There can be unknown unknowns that keep us from trusting our models completely.
It's a very rich and challenging problem space.
In my understanding, the Dept. of Energy was the first community to engage with these problems due to the test ban treaty. They had the mandate to ensure the nuclear stockpile works, despite not being able to fully test it. So they need models and they need to know how far to trust them.
> Mere Monte Carlo state exploration is wasteful and doesn't provide much insight. Often we don't have error bars on model outputs to even know if an "improvement" in a metric is significant.
The funny thing is, I didn't check the author's name until just now. Ed Dougherty, who people below have derided as a "mere engineer", has been working on these problems forever. I'm honestly surprised he's still active or even alive: he was a graybeard when I heard his talk a decade ago. He is a bona fide systems biologist, one of the oldest ones.
At that time, his group was doing gene regulatory network inference on gene expression with ~600 genes. They were using the kind of approach (MC) you mention to infer a small subset of the overall network.
The main thing I took away from their results (at the time) is you can get multiple drastically different network topologies all with similar metrics on the objective function. This implies GRN inference was not inferring some kind of underlying reality. It also suggests you cannot accurately infer subnetworks, which in turn suggests cellular networks aren't all that modular.
Therefore, really a distinction should be drawn between models that are simply predictive and those that also model the underlying reality, which is even harder.
> We rely a lot on complex computer simulations, or complex physics-based models...we want to learn from these models, and we want to reach conclusions from them.
Not in molecular biology. There genuinely are no models like that except in very limited subfields like protein folding, and 99% of biologists would see them as mathematical mumbo-jumbo.
I see from your bio you're also in engineering research. You would not believe it if I told you how mathematically illiterate the average PhD biologist is. My PhD alma mater added a statistics course for the first time last year, a 2 week summer course. Calculus I is "recommended" for admission. This is not unusual.
It isn't seen as needed, because state of the art research is basically all qualitative, with a quantitative veneer of t-tests overlaid on top. So I'm glad to hear other fields at least recognize the problem. Biology hasn't even got that far.
I also didn't care for the coarse characterizations nearby.
I take your point about the distinction between models that reproduce behavior ("simply predictive") vs. models of underlying components, and what you can learn from both.
This comes up in fields I work on with machine learning models vs. physics-based models. E.g., ML models that take a field of wind vectors at time t, and predict the wind at time t+1, vs. physical models that implement the flow equations. You can fit parameters of both flavors of models to match observations, but we certainly have more confidence in the robustness of the physics-based models.
About mathematically-challenged biologists - here's a hypothesis. I'll bet that if you started scanning conference abstracts in your domain for "uncertainty quantification," then some more carefully-posed modeling activities would crop up. (As you suggest, probably in the domains where more quantitative work is done.)
> we certainly have more confidence in the robustness of the physics-based models.
That is interesting. I don't know to what extent wind vectors are considered chaotic in the technical sense, but I would have guessed that chaotic systems would be more robustly modeled by ML instead of a physics approach. This is because I have a vague idea in my mind that ML would somehow compensate for the initial condition dependence in a way physics modeling would not. ML models tend to also have more parameters with smaller coefficients which I would identify with robustness (up to a point). I'm not gainsaying you, just expressing that I find this counterintuitive.
Of course the physics models would provide more insight into the nature of the problem.
And more generally it is my understanding that one way to define the difference between a "complex system" and a "system" is that a complex system is not predictable by physics simulations because of emergent properties and so forth.
For this reason, I interpreted OP's call for a "mathematical epistemology" not so much as a call for more physics-based modeling, or for opaque ML models, but as an expression of the need for a (currently undefined) new type of mathematical language to model, describe, and predict complex emergent systems.
> I'll bet that if you started scanning conference abstracts in your domain for "uncertainty quantification," then some more carefully-posed modeling activities would crop up.
I'm sure you're right. I let my wistful longing that there would be more of this type of thinking in biology drag me into hyperbole suggesting that there is none of it.
I appreciate the pointers to terms and books that could get me up to speed on modeling. It's not really relevant to my primary area, but I do wish these approaches well from afar. And who knows, if I learn more, maybe I can apply more of this type of approach in my work. Getting audiences to understand it would be another task entirely...
Yet, we want to learn from these models, and we want to reach conclusions from them. This has turned into a key problem for the scientific enterprise.
There are so many linked issues, some technical, some philosophical: Mere Monte Carlo state exploration is wasteful and doesn't provide much insight. Often we don't have error bars on model outputs to even know if an "improvement" in a metric is significant. There can be unknown unknowns that keep us from trusting our models completely.
It's a very rich and challenging problem space.
In my understanding, the Dept. of Energy was the first community to engage with these problems due to the test ban treaty. They had the mandate to ensure the nuclear stockpile works, despite not being able to fully test it. So they need models and they need to know how far to trust them.
One landmark reference for that is the NAS report on uncertainty quantification and complex models: https://www.nap.edu/catalog/13395/assessing-the-reliability-...