This is par for the course with computational research. I discovered a bug in my...

nestorD · on Nov 3, 2018

A few days ago in the description of a paper found on hackernews (https://news.ycombinator.com/item?id=18346943) :

> We also noticed significant improvements in performance of RND every time we discovered and fixed a bug [...]. Getting such details right was a significant part of achieving high performance even with algorithms conceptually similar to prior work.

They call bugs 'details' which, I find, is a frightening state of mind for someone publishing an algorithm.

geezerjay · on Nov 3, 2018

> They call bugs 'details' which, I find, is a frightening state of mind for someone publishing an algorithm.

It really depends on the algorithm. For example, a bug in the random number generator of a stochastic search algorithm that affects, say, the variance of a distribution won't have a relevant impact on the outcome.

mlthoughts2018 · on Nov 3, 2018

That could have a huge impact. For example if it affected random draws of a hyperparameter in a Bayesian model, leading to incorrect credible intervals ultimately in the posterior distribution. Or worse, if the RNG bug was affecting the variance of randon samples in some deep component of an MCMC algorithm like NUTS or even simple Metropolis. Depending on the exact nature of the bug, it could even cause the model to violate Detailed Balance, and entirely invalidate sampling based inferences or conclusions.

riskneutral · on Nov 3, 2018

Well that is why you would always run statistical validity tests on the RNG and other intermediate values whenever using a Monte Carlo model in production. Ideally the tests should run as part of a Continuous Integration workflow.

mlthoughts2018 · on Nov 3, 2018

If you (dubiously) write your own RNG, sure. But you should never do that. And for library RNGs, you should execute the unit tests of the library. Frankly, running it as part of CI is at best overkill and at worst adds complexity that costs you. If you pin versions of your dependency and isolate the artifact into an in-house artifact repository, so that the library code is literally never changing unless you explicitly modify the version, then you should test it up front, then again only occasionally if you actually have evidence of a bug. And as part of any code review of the code change that introduces a version change.

andyv · on Nov 2, 2018

Computational research is incredibly difficult because it's usually hard to see the effect of a bug. A triangle drawn on the wrong place of a screen can be easy to see, but a typo in your integration subroutine? Hard to spot if you don't catch it when it is born.

I also had a few do-overs at the end of my thesis, but fortunately had a cluster standing by...

BeetleB · on Nov 3, 2018

>A triangle drawn on the wrong place of a screen can be easy to see, but a typo in your integration subroutine? Hard to spot if you don't catch it when it is born.

Well, there's also this notion of testing and regression. As I said in another comment a few days ago:

>A few weeks ago I had a conversation with a friend of mine who is wrapping up his PhD. He pointed out that not one of his colleagues is concerned whether anyone can reproduce their work. They use a home grown simulation suite which only they have access to, and is constantly being updated with the worst software practices you can think of. No one in their team believes that the tool will give the same results they did 4 years ago. The troubling part is, no one sees that as being a problem. They got their papers published, and so the SW did its job.

fouc · on Nov 3, 2018

Wow. If it's not reproducible, can it even be called science?

slavik81 · on Nov 3, 2018

It's not really much different from a physical experiment, where the expectation is that you'll have to rebuild the experimental apparatus yourself to reproduce the experiment.

An independent implementation of the experiment is neccessary for a full reproduction anyways. If you just run their code again, you'll end up with all their bugs again.

(But, don't get me wrong. I like when researchers release their code. It's still very useful.)

ascar · on Nov 3, 2018

This can't be upvoted enough. Makes me thinking that using published source code and reproducing/validating results is completely orthogonal. Maybe it's a good thing the source code for science gets not published.

BeetleB · on Nov 3, 2018

I like the concept of rewriting all the code by an unbiased third party to see if they can reproduce the results, but in practice what this leads to is:

1. People will not bother. It took a lot of minds to come up with the software used (in my friend's case, several PhD's amount of work). No one is going to invest that much effort to invent their own software libraries to get it to work.

2. Even when you do write your own version of the software, there are a lot of subtleties involved in, say, computational physics. Choices you make (inadvertently) affect the convergence and accuracy. My producing a software that gives different results could mean I had a bug. It could mean they did. It could mean we both did. Until both our codes are in the open, no one can know.

It is very unlikely that you'll have a case of one group's software giving one result and everyone else's giving another. More like everyone else giving different results.

Case in point:

https://physicstoday.scitation.org/do/10.1063/PT.6.1.2018082...

HN discussion:

https://news.ycombinator.com/item?id=17819420

BeetleB · on Nov 3, 2018

This issue comes up often on HN, and it used to a lot in /r/science (maybe it still does - I left that subreddit years ago).

If there's one thing I could convey to the world from what I learned from my time in academia, it is this: Most scientists at universities do not care about reproducibility.[1] Not only that, many people intentionally omit details from papers so that it is hard for rivals to reproduce their work - they want the edge so they can publish without competition. This isn't a shadowy conspiracy theory - this is what advisors openly tell their students. Search around on HN and reddit and you'll see people saying it.

[1] My experience is in condensed matter physics - it may not apply to all of academia.

andyv · on Nov 3, 2018

People doing science programming are the worst programmers in the world. The reason is that they are focused on a calculation and result, not the program. I helped a guy speed up his program once. He was sorting around 10^4 to 10^5 elements using a bubble sort (which he had reinvented).

BurningFrog · on Nov 3, 2018

Perhaps it's a reproducible way to build careers?

petters · on Nov 3, 2018

>> being updated with the worst software practices you can think of.

I can believe that. I have seen code in academia and it was all one or two-letter variables without even linebreaks between statements.

thrmsforbfast · on Nov 2, 2018

Open source doesn't ensure quality code.

Ideally the code would be part of the peer review process, but code review is really expensive, so who knows how that would play out.

ChrisFoster · on Nov 2, 2018

True, but it does provide at least some measure of reproducibility. Quality of implementation and reproducibility are orthogonal and both very valuable in their own right.

shkkmo · on Nov 2, 2018

> Open source doesn't ensure quality code.

Yes, but closed source helps ensure that low quality code is hidden from sight. It also means that people who distrust or doubt the conclusions have no chance to identify any bug(s) and disprove the results or conclusions.

fifnir · on Nov 2, 2018

It's simple:

We stop publishing in papers, and instead adopt smaller chunks of our work as the core publishing units.

Each figure should be an individually published entity which contains the entire computational pipeline.

Figures are our observations on which we apply logic/philosophy/whatyouwannacallit. Publishing them alongside their relevant code makes the process transparent, reproducible and individually reviewable, as it should be.

We can then "publish" comments, observations, conclusions etc on those Figures as a separate thing. Now the logic of the conclusions can be reviewed separately from the statistics and code of the figure.

chiefalchemist · on Nov 3, 2018

A comparable solution would be for all involved to value all research, not just the ground breaking, earth shattering type.

As it is, research that yields a "failure" is buried. That means wheels are being reinvented and re-failed. That means there's no opportunity to compare similar "failures", be inspired, and come up with the magic that others overlooked.

Unfortunately, I would imagine, even if you can get researchers to agree to this the lawyers are going to have a shit fit. Imagine Google using an IBM "failure" for something truly innovative.

tokai · on Nov 3, 2018

What you are proposing sounds a lot like the concept of the least publishable unit.

https://en.wikipedia.org/wiki/Least_publishable_unit

jpeloquin · on Nov 3, 2018

> Each figure should be an individually published entity which contains the entire computational pipeline.

I agree in principle. But, for the experimental sciences, we need better publication infrastructure to make this practically possible.

For example, consider a figure that summarizes compares, between several groups, the mechanical strain of tensile test specimens for a given load. Strain is measured from digital image correlation of video of the test. Some pain points:

1. There is a few hundred GB of test video underlying the figure. Where should the author put this where it will remain publicly accessible for the useful lifetime of the paper? How long should it remain accessible, anyway? The scientific record is ostensibly permanent, but relying on authors to personally maintain cloud hosting accounts for data distribution will seldom provide more than a couple years' of data availability.

2. Open data hosts that aim for permanent archival of scientific data do exist (e.g., the Open Science Framework), but their infrastructure is a poor match with reproducible practices. I haven't found an open data host that both accepts uploads via git + git annex or git + git LFS and has permissive repository size limits. Often the provided file upload tool can't even handle folders, requiring all files to be uploaded individually. Publishing open data usually requires reorganizing it to according to the data host's worldview or publishing a subset of the data, which breaks the existing computational analysis pipeline.

3. Proprietary software was used in the analysis pipeline. The particular version of the software that was used is no longer sold. It's unclear how someone without the software license would reproduce the analysis.

Finally, there's the issue of computational literacy of scientists. In most cases, the "computational pipeline" is a grad student clicking through a GUI a couple hundred times, and occasionally copying the results into an MS Office document for publication. No version control. Generally, an interactive analysis session cannot be stored and reproduced later. How do we change this? Can we make version control (including of large binary files) user-friendly enough that non-programmers will use it? And make it easy to update Word / PowerPoint documents from the data analysis pipeline instead of relying on copy & paste?

If any of these pain points are in fact solved and my information is out of date, I would be thrilled to hear it.

fifnir · on Nov 4, 2018

1 ans 2: I like IPFS for this, check it out

3: analysis that uses propriatory is marked appropriately as second class

> computational literacy of scientists

Welp...

no_identd · on Nov 2, 2018

I have two words for you: Ted. Nelson.

j88439h84 · on Nov 2, 2018

Can you expand on this?

lvh · on Nov 3, 2018

I can’t speak for GP, but Nelson invented hypermedia/hyperlinks and had a vision for the future that included documents including other documents. All of that seems pretty compatible.

agumonkey · on Nov 3, 2018

similar to reproducible builds or nix

research just jumped onto jupyter notebooks, it's halfway there, someone helps the remaining step

d0mine · on Nov 3, 2018

www was created to publish information in CERN but we can use it in other contexts too ;) http://info.cern.ch/Proposal.html

lvh · on Nov 3, 2018

Of course it won’t ensure anything, but currently being completely unable to reproduce results, even as the author but just a year from now, is par for the course.

darpa_escapee · on Nov 2, 2018

It's not about code quality, it's about transparency and ease of reproduction.

BurningFrog · on Nov 3, 2018

Code review is cheap. I do it for fun. But it doesn't prove anything.

Science should prove things...

m_mueller · on Nov 3, 2018

science can never prove anything as a matter of principle. it can only disprove all the alternatives. math and logic can prove, but only within the model it has built up, which has been shown to contain unprovable axioms that one must simply accept.

BurningFrog · on Nov 3, 2018

Yeah, I'm aware of the strict theory.

Fomite · on Nov 3, 2018

Little of what I do, even with the most rigorous methods available and the best practices from both software development and computational science, proves anything.

BurningFrog · on Nov 3, 2018

I know. And I think it's a problem for science...

Logical proofs will never happen for software development, but surely standards for scientific programming can be tightened up a few levels!

I think I heard of some reform proposals from the Reproducibility Crisis reformers.

Fomite · on Nov 3, 2018

I more mean there are whole aspects of science that aren't provable without being able to actually obtain counterfactuals, and that means time machines

badpun · on Nov 3, 2018

This is why I'm a bit skeptical about the global warming predictions. AFAIK they're all based on multi-million lines of code models. Not only it's monumentally hard to have such large code base without bugs, but also, in case of a software implementing scientific models, finding bugs is extra hard (compared to software which works in a visible way and has millions of users, such as Linux kernel, games, car's embedded software etc.). An effort to have such model verified (let's say to a NASA standard) to be bug-free would probably cost billions of dollars. And that's only bugs, let's not forget that all the models are approximations, plus the numerical methods used all have their quirks and limitations etc. All in all, the problem seems too hard to tackle for humanity right now.

xorcist · on Nov 3, 2018

While reasonable skepticism is healthy, global warming is such a well sudied phenomenon by now that an unreasonable number of independent codebases must have identical bugs in order for it to be false.

There's also the fact that we have had a pretty solid grasp about the chemical reactions of greenhouse gases since long before computers, both theoretically and empirically, and we know roughly how much is put out in the atmosphere.

Where the models diverge is on far finer points than what is needed to make the basic policy changes that seems to be where we are stuck right now.

rasteau · on Nov 3, 2018

> an unreasonable number of independent codebases must have identical bugs

Entertaining[0] badpun’s skepticism, it is not necessary they have identical bugs, only that their bugs yield similarly biased results.

For example, if a significant number of bugs are identified by their affect on the results, then bugs contributing to “wrong” results might be more likely to be identified and fixed.

[0] In the Aristotelian sense.

aldoushuxley001 · on Nov 3, 2018

Not if the bugs are actually just bad/corrupted data. For example, the main dataset the IPCC is based on appears to contain all sorts of bad data that definitely make me skeptical of the conclusions the IPCC comes to (https://researchonline.jcu.edu.au/52041/).

If global warming is actually wrong, it's most likely because of bad/corrupt data in the datasets used.

mo3gut · on Nov 3, 2018

Quite right. In fact, those statisticians in the posted article are surely mistaken. We know that using tricks to hide things is sound science.

http://www.realclimate.org/index.php/archives/2009/11/the-cr... http://www.realclimate.org/index.php/archives/2009/11/the-cr...

Since it was sound science in 2009, why would it not be now?

nafey · on Nov 3, 2018

The counterpoint is that we have already seen a steady temperature rise. So even though the specifics of the various simulations might not play out as predicted we can expect temperatures to rise.

badpun · on Nov 3, 2018

Knowing that the temperature will rise is not enough to make policy decisions though. You need to predict the increase's magnitude, as well as practical consequences, such as climate change, how much the sea level will rise etc. And for that you need resonable and bug-free models.

leeoniya · on Nov 3, 2018

> how much the sea level will rise etc

this will be the least of our problems. we understand very little about nature to have any reasonable prediction model. we don't know the inflection point which will cause massive collapse of major ecosystems we depend on.

all we know is that things are changing fast. faster than many non-human organisms are able to adapt.

femidav · on Nov 5, 2018

Steady? We have seen 20-years rise after a colder period. Nothing out of order.

magic-chicken · on Nov 3, 2018

How would the global warming predictions all be biaised in the same way ? _All_ the studies are measuring the same tendency : the temperature is rising. The model does not need to be _absolutely_ precise to be right.

badpun · on Nov 3, 2018

> _All_ the studies are measuring the same tendency : the temperature is rising.

Are you talking about predictions (and not measurements, you don't need a model to measure temperature)? Assuming you are, there's unfortunately a huge problem with modeling (and heavy math and stats-based science in general), in that researchers tend to stop looking for bugs in the model when it returns the results that they expect. In other words, if a bug in the model tells the researcher that Earth's temperature will decrease by 4 C by 2100, he will look over the model until he finds the bug, but if the model tells him that the temperature will increase by 2 C, thus confirming his inner bias, he'll declare it correct and move on to writing a paper based on the "finding".

Alternatively, as a thought experiment, imagine if math research were done in the way climate science is done. We would have proofs that are millions of pages long and were never verified by anyone. We would trust in them only because the author says that they are correct. Is this science?

magic-chicken · on Nov 3, 2018

A given prediction can be wrong, an experiment may be biaised, my point is that you choose to ignore that the vast majority of the experiments and measurements point in the same direction.

>>Researchers tend to stop looking for bugs in the model when it returns the results that they expect.

Again... they are _ALL_ biaised ?

badpun · on Nov 4, 2018

It's not impossible - groupthink (https://en.wikipedia.org/wiki/Groupthink) has happened many times in the past among supposedly most brilliant minds.

Also, how many independent, comprehensive models (with codebases) for global climate change are there in the science world?

canhascodez · on Nov 3, 2018

Could you perhaps indulge me by describing the action of CO2 in the atmosphere, as you understand it?

_up · on Nov 3, 2018

Can you explain why 300ppm CO2 is totally normal and fine, but 400 ppm (Parts per Million) CO2 in the Atmosphere basically means the world is doomed?

canhascodez · on Nov 3, 2018

I'd be more than happy, but it would help if you answered my question, so that I can more effectively address your concerns.

_up · on Nov 3, 2018

CO2 produces a greenhouse effect by absorbing IR. This alone doesn't prove that Industrial produced CO2 alone is this time mainly responsible for climate change. All the other times' science believes the climate changed because of sun intensity.

dragonwriter · on Nov 5, 2018

> All the other times' science believes the climate changed because of sun intensity.

False. Lots of past climate changes were due to changes on Earth and its atmosphere (and sometimes, specifically, life on Earth), not changes in solar output (e.g., notably, the Huronian glaciation believed to have resulted from the Great Oxygenation Event, which resulted from the exponential growth of photosynthetic life.)

canhascodez · on Nov 4, 2018

Solar intensity has increased slightly over the last few billion years, but previous changes in climate have been driven primarily by Milankovich cycles, volcanic emissions, and plate tectonics.

That the post-industrial rise in atmospheric CO2 is of anthropogenic origin is hopefully not a point of dispute, but it is demonstrable if necessary. Thus it remains to show that this must raise the equilibrium temperature. So, as you say, CO2 selectively absorbs outgoing IR. In the lower atmosphere, this actually does not have as much of an effect as you might think. Water vapor blocks quite a bit of the absorption spectrum, and the effect of CO2 is more-or-less saturated already.

The mean free path of an outgoing IR photon in the lower atmosphere is quite short. Absorbed photons are re-emitted in a random direction, but take an overall upward course, the mean free path rising with altitude. At the (radiative) top-of-atmosphere, the mean free path is infinite: the photon is more likely to leave Earth. At the edge of space, there is essentially no water vapor, so the action of CO2 is greater. The effect of increasing the amount of CO2 in the atmosphere is to push the CO2-dense region of the atmosphere further out into space. Photons must take a longer path out of the atmosphere, and this must raise the overall temperature of the Earth proportionally, specifically by 3.7 W/m^2 per doubling of CO2, which is commonly held to be equivalent to 1 degree C of global temperature. This must be the case unless our understanding of thermodynamics is very wrong (and if you have an issue with thermodynamics then you have some pretty serious issues).

So, one degree C ain't so bad, right? Well, it wouldn't be if that were it. However, there are several problematic feedbacks. One is that melting a lot of ice lowers the Earth's albedo, which causes it to absorb more heat. Another issue is that there is a lot of this "water" stuff around, which is very readily absorbed by the atmosphere, in a manner that increases very sharply with temperature. Water vapor is a much better greenhouse gas than CO2 by all accounts.

Climate science is not an extrapolation from the temperature record. There is a solid minimum bound on the temperature effects of doubling atmospheric CO2, and a variety of amplifying positive feedback effects. So far, in the last twelve decades, we have not managed to find anything which would reduce those effects to something manageable. At this point, the effect would need to be both very large, in order to offset the strong H2O feedbacks, and very small, to not have been noticed. The most plausible option would be "something poorly understood about the H2O feedbacks". I believe the most successful of such theories would be Dr. Richard Lindzen's Iris Hypothesis, which has generally failed to find support. At this point, there are no particularly plausible mechanisms which would transfer this extra energy to space, and if those did exist, then they would not necessarily be a non-issue: even if thermodynamics and optics are entirely wrong, the planet is warming, and we will have to deal with that even if it can't be prevented.

If you have any other questions, or would like citations for any of the above, do feel free to ask.

magic-chicken · on Nov 4, 2018

Can you explain why a healthy body temperature is about 98F and why you can die if it gets too far from that ?

femidav · on Nov 5, 2018

Can you explain what caused ice ages? 10K rises and falls in few years?

canhascodez · on Nov 6, 2018

Glaciation is most strongly dependent on Milankovich cycles:

  - https://en.wikipedia.org/wiki/Milankovitch_cycles
  - http://www.indiana.edu/~geol105/images/gaia_chapter_4/milankovitch.htm

Plate tectonics and volcanic activity have also influenced climate in the past, e.g. the closing of the Panamanian isthmus, or the formation of the Deccan Traps.

Interestingly, the original paper proposing AGW (in 1896) was actually intended to explain Ice Ages:

  - http://www.rsc.org/images/Arrhenius1896_tcm18-173546.pdf

femidav · on Nov 5, 2018

"_All_ the studies are measuring the same tendency : the temperature is rising"

There could be lots of reasons for that. Anyway, the temperatures were not always rising, clear temperatures rise was observed in 1930-40s and in 1980-90s. Cooling in 1960-1970s. And yes, prediction has to be precise. If a model predicts rise of 3K in 100years, and you measure 1.3K - your model is wrong. It's even more wrong if you don't take into account any of the natural cycles, even if prediction is accidentally correct.

jplayer01 · on Nov 3, 2018

[flagged]

geokon · on Nov 3, 2018

He said he's skeptical, not that he thinks it's wrong. I've got to say that this kind of rabid response from climate change proponents kinda prods my inner contrarian.

I think at this point the group think is so strong that if anyone came out with evidence against global warming they would commit career suicide by publishing it

jplayer01 · on Nov 3, 2018

> I've got to say that this kind of rabid response from climate change proponents kinda prods my inner contrarian

Oh good. So you'd rather see millions of people be displaced. We're fucked as a species when "I'm a contrarian" is a valid reason to disagree.

__s · on Nov 3, 2018

I agree with your point, but note..

> they would commit career suicide by publishing it

Meanwhile some researches create their career by publishing this sort of stuff, even when it isn't well researched

brobdingnagians · on Nov 3, 2018

Automatically reducing an argument to saying someone is equivalent to an anti-vaxxer kind of sidesteps the issue and is a logical fallacy. If there is a problem with the argument, elucidate the issues, but "Reductio ad X" arguments are not valid. This is a large and nuanced issue, which requires more thought and argumentation than can usually be contained in a HN comment, but I think the reduced thing we have here is whether computer code predicting a certain outcome should be trusted -- and, more importantly, making major decisions based on that prediction, given the nature of bugs and problems we see in normal code. I believe we all agree that potential consequences are far worse, we simply disagree on how to treat those consequences and where they come from. Because there is so much at stake, it is best to be more sure of what is happening and to take the right choice, instead of the first one we thought was correct based on a limited, and potentially flawed, computer model. If the computer model predicts dire catastrophe, then we should take it seriously and do another one with even greater resources in order to ensure the prediction is correct and what course of action should be taken. Perhaps it is right in predicting catastrophe, but not in predicting the -right- catastrophe, in that case we could spend a large amount of resources on the wrong solution and miss the right solution, which could end up being even more catastrophic.

jplayer01 · on Nov 3, 2018

If he has some actual basis for an opinion that contradicts the large majority of scientific consensus, then he should have at it. But to reject science out of hand with no reasoning other than "Well, there could be bugs" is just insane.

badpun · on Nov 3, 2018

> So you'd rather just ignore all scientific results until somebody does a NASA-level code review because there might be bugs?

The bugs, if serious enough, can make it not scientific, in the same way that a paper that has grave errors in it is wrong and thus obviously not scientfic. So, before we do a thorough review of the code, we should treat it with due scepticism.

EDIT: this can also be expressed in terms of risk analysis. For typical software, the consequence of a bug is low - most software is commercial, so the impact of the bug will be limited to that company's bottom line (with exceptions of software that can kill, but these people already are serious about bugs). Also, most bugs in most software are either highly evident (the button does not work, you get random segfaults etc.) or have limited impact.

On the other hand, the bugs in climate change models, given their "pipeline" nature (wrong result from one module is propagated downstream all the way to the final prediction of expected temperature change), can quite often have severe impact. They can also be not evident at all - they can for example change the final outcome predicition by 1 C.

Compound that with the fact these predictions are used to make trillion dollar decisions on global policies, and you can see that the actual damage done by bugs is not unlikely to be in the trillions. And that's why I say it's probably wise to subject the models to extreme scrutiny.

jplayer01 · on Nov 3, 2018

There's enough proof that we're destroying our environment on a grand scale - literally the only place that we can survive as a species. All of what you said is just rationalization for the kind of behavior that has gotten us to this point and will continue far into the future. It's easier to deny anything is wrong than it is to do anything about the problem. And I'm not at all surprised by how many climate skeptics there are here - intellectuals and really fucking smart people who are incredibly good at rationalizing their behaviors and beliefs to avoid the feeling of having to do anything and who think they're able to think about the subject matter more clearly and productively than the scientists who work on it day in and day out.

femidav · on Nov 5, 2018

There is enough proof that the additional CO2 caused massive greening of the Earth.

fghtr · on Nov 4, 2018

> I think we need to push hard to require all taxpayer-funded research to make any code that results in a journal article publicly available.

"Already 18824 SIGNATURES – sign the open letter now!" https://publiccode.eu

bad_news_bears · on Nov 3, 2018

Was there a large, I guess “significant” difference between your results pre- and post-bug?