Hacker News new | past | comments | ask | show | jobs | submit login

I like to put "Cosmic ray flipped a bit" into bug reports.



I used to work as a contractor for John Deere and one of the engineers there would capriciously insist on adding runtime consistency checks in code review because "you never know if lightning might strike or a cosmic ray flips a bit in RAM". This was not for any life-or-death software, it was for the infotainment stuff which was already insanely buggy beginning with the Bosch radio we had sourced and carrying into the ambiguous protocol John Deere designed to speak with the radio and the application code the lowest-bidder contractors wrote before hiring me to fix it (lightning strikes and cosmic rays were the least of their worries).

So the radio would fail to connect to many bluetooth devices at the time (because Bosch), the application had no way of telling what state the radio (that it was meant to speak to) was in (because of the faulty protocol), and the application was riddled with other bugs (because lowest-bidder contractors), but by god it was safe from lightning and cosmic rays (except not really because they could just as easily alter the program as the state the program was operating on).


I don't get the example, it sounds flat out better then doing nothing otherwise.


Well, the alternative was to invest the time into the many glaring concrete bugs rather than hypothetical 'cosmic ray' bit flips. I don't have a fundamental problem with runtime consistency checks if there's some compelling concern and a clear up-front policy for when/where to add them (as opposed to dealing with the whims of a capricious code reviewer).


We all have those anal coworkers


I remember hearing about this (not sure if there's a better article; this is literally the first one I found): https://www.thegamer.com/how-ionizing-particle-outer-space-h....

The story goes that Mario 64 speedrunner accidentally triggered a glitch which was thought to require that a particular value is "true" (Mario is touching a ceiling) which was not in this case. The following glitch hunt had many people concluding that the most likely thing that happened was a bit-flip via cosmic radiation which caused the exact distance change which could have otherwise been caused by a different but similar glitch.


There was a famous story how Google couldn’t produce a working index for months because they used non ecc memory for their servers. This was like 20 years ago.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: