Monte-Carlo can and should be deterministic and repeatable. It’s a matter of correctly initializing you random number generators and providing a known/same random seed from run to run. If you aren’t doing that, you aren’t running your Monte-Carlo correctly. That’s a huge red flag.
Scientists need to get over this fear about their code. They need to produce better code and need to actually start educating their students on how to write and produce code. For too long many in the physics community have trivialized programming and seen it as assumed knowledge.
Having open code will allow you to become better and you’ll produce better results.
Side note: 25 years ago I worked in accelerator science too.
Yes I understand how seeding PRNGs work and I personally do that for my own code for debugging purposes. My point was that not using a fixed seed doesn't invalidate their result. It's just a cheap shot and, to me, demonstrates that the lockdownskeptics author doesn't have a real understanding of the methods being used.
Also, to be clear, I support open science and have some of my own open-source projects out in the wild (which is not the norm in my own field yet). I'm not arguing against releasing code, I'm arguing against OP arguing against this particular piece of code.
The main issue is if it used sensible inputs, but that's entirely different from code quality and requires subject matter expertise, so programmers don't bother with such details -_-
I write M-H samplers for a living. While I agree that being able to rerun a chain using the same seed as before is crucial for debugging, and while I'm very strongly in favour of publishing the code used for a production analysis, I'm generally opposed to publishing the corresponding RNG seeds. If you need the seeds to reproduce my results, then the results aren't worth the PDF they're printed on. [edit: typo]
> Monte-Carlo can and should be deterministic and repeatable
I guess it can be made so, but not necessarily easy / fast (if it's parallel, and sensitive to floating point rounding). And sounds like the kind of engineering effort GP is saying isn't worth it. Re-running exactly the same monte-carlo chain does tell you something, but is perhaps the wrong level to be checking. Re-running from a different seed, and getting results that are within error, might be much more useful.
I guess the best thing would be that it uses a different random seed every time it's run (so that, when re-running the code you'll see similar results which verifies that the result is not sensitive to the seed), but the particular seed that produced the particular results published in a paper is noted.
But still, for code running on different machines, especially for numeric-heavy code that might be running on a particular GPU setup, distributed big data source (where you pull the first available data rather than read in a fixed order), or even on some special supercomputer, it's hard to ask that it be totally reproducible down to the smallest rounding error.
Then you need to re-imagine the system in such a way that junior scientific programmers (i.e. Grad Students) can at least imagine having enough job security for code maintainability to matter, and for PIs to invest in their students' knowledge with a horizon longer than a couple person-years.
> Monte-Carlo can and should be deterministic and repeatable.
That's a nitpick, but if the computation is executed in parallel threads (e.g. on multicore, or on a multicomputer), and individual terms are, for example, summed in a random order, caused by the non-determinism introduced by the parallel computation, then the result is not strictly deterministic. This is a property of floating-point computation, more specifically, the finite accuracy of real floating-point implementations.
So, it is not deterministic, but that should not cause large qualitative differences.
> Monte-Carlo can and should be deterministic and repeatable. It’s a matter of correctly initializing you random number generators and providing a known/same random seed from run to run.
Perhaps if you use only single-threaded computation, you are interested in averages, and the processes you are interested in behave well and mostly linear.
But
- running code in parallel easily introduces non-determinism, even if your result computation is as simple as summing up results from different threads
- the processes one is examining might be highly non-linear - like lightning, weather forecasts, simulation of wildfires, and also epidemic simulations
- especially for all kind of safety research, you might actually be interested not only in averages, but in freak events, like "what is the likelihood that you have two or three hurricanes at the same time in the Gulf of Mexico", or "what happens if your nuclear plant gets struck by freak lightning in the first second of a power failure".
What should be reproducible are the conclusions you come to, not the hashed bits of program output.
> If you aren’t doing that, you aren’t running your Monte-
Carlo correctly. That’s a huge red flag.
Since I have a bit of experience in this area, quasi-Monte Carlo methods also work quite well and ensure deterministic results. They're not applicable for all situations though.
Scientists need to get over this fear about their code. They need to produce better code and need to actually start educating their students on how to write and produce code. For too long many in the physics community have trivialized programming and seen it as assumed knowledge.
Having open code will allow you to become better and you’ll produce better results.
Side note: 25 years ago I worked in accelerator science too.