Hacker News new | past | comments | ask | show | jobs | submit login
The code worked differently when the moon was full (hanselman.com)
447 points by shanselman on Sept 28, 2021 | hide | past | favorite | 162 comments



We once had a customer who would call us in a panic a couple of times a year saying our inspection equipment was experiencing unusually high false rejects and they were generating very high scrap rate. By the time we got a technician on site the next day, everything was working flawlessly and the customer couldn't reproduce the problem either. This went on for almost three years with various levels of escalation to the current management. Finally, one day a technician was on site for another project when the customer came up to him and said "It's happening right now! Come fix it!" The technician rushed over to the equipment and discovered that the sun was shining at exactly the right angle to cause a lens flare in one of our cameras. This happened twice a year as the sun moved along its trajectory. A strategically placed piece of opaque plastic fixed it permanently.


I knew a guy who was a computer tech early in his career. One of his rounds was on a military base, and they had just moved their computer room up one floor. They started having problems with their tape drive, it would just randomly pop up an error while they were using it. He tried mightily to diagnose the problem but couldn't figure it out. Finally he took a break and walked to the nearest window and looked out. He saw a radar antenna making a sweep - and realized the error came when the antenna was pointed in their direction.


I sent my Commodore 64 for repairs more than once only for them to find nothing wrong until we realised it was the buildup of static electricity from being stored under our big TV - it "broke" when we didn't use it for a while but watched lots of TV. It'd be "fixed" by the time it took before the repair people got around to it.

The symptom was that it started "typing" automatically.


That’s one downside of static typing.


Well done


It truly is well done to make a cheap joke on HN that hits just well enough not to get downvoted to oblivion.


boooooo


what do you know about true humour? :P


I had a weird 'vibration' in the monitor for my first PC (this was after I caused it to decolor it by passing a speaker too close by it); took it to the shop, no issue. Back home, issue. I believe it turned out to be a power brick that was too close to the screen or its cables.


Reminds me of the sun outage that affects Indian stock exchanges (BSE and NSE) at certain times of the year. VSATs used by traders experience loss of connectivity to geostationary satellites while they transit the sun.

https://en.wikipedia.org/wiki/Sun_outage


I had a similar issue in my industrial automation class. We were sorting cylinders by diameter and height as they went down the conveyor belt. PLC controlling motors, sensors, etc.

My group got everything setup, built our program, and everything worked fine. Waited a few minutes for the TA to verify, but it failed. We changed a few things, it worked, but failed when he came over.

Another group looked over our code, no issues noticed.

Finally I realized I was standing when we were testing things. I sat down waiting for the TA to verify. My shadow blocked the sun from the photo eye. Wasted half the lab on an issue that was entirely dependent on our position in the room, but found the root cause.


>Wasted half the lab

I don't think it was entirely wasted time, though. You can't plan to teach that kind of lesson ("Look outside of your usual blinkered problem-solving-space"), it happens when it happens.

As this whole thread shows, most of us learn it during our careers at some point but you were lucky enough to learn it before you even started.


>Finally I realized I was standing when we were testing things.

What clued you in to this possibility?


You get the same problem with geostationary satellites, e.g. for satellite TV.

There's period of about 10 days every spring and fall where, for up to 30 minutes every day, the sun transits 'behind' a satellite within the beamwidth of the dish and totally overwhelms the signal at the LNB.


Once upon a time I had a problem with remote control. It woukd stop working from different positions, but from time to time, not always. It took a while for me to realize there’s a heating radiator behind my back in that directions. And it went hotter or colder depending on thermostate. I guess at one point its IR output would overwhelm IR output from remote control.


When I first read this comment I was picturing a hot water radiator behind you, so it sounded unlikely based on the blackbody radiation curve for ~320 K.

But now I realize you're probably talking about an electric heater. Given that their heating elements get so hot that they're putting out visible red/orange light, it seems very plausible that an electric heater could produce enough ~940 nm IR to drown out the signal from the remote.

Thanks for sharing! I will try to keep this in mind when I'm troubleshooting IR remotes in the future.


As I try to recall, it was not typicall remote, the problem was between Nokia 6310i and IrDA transmiter, since that's how you connected phone to the PC back then. Looking at wikipedia, that IrDA was small range, probably low powered, so maybe even very low IR source in the background could create problems? IIRC problem was with the water radiator, which surprised me a lot back then. But we did use electric heaters as well at the time and my memory could be wrong.

Btw. that Nokia 6310i from 2002/3 is still used today by mother-in-law, and original battery still holds for 4-5 days!


I experienced this when I was working on a cable company but sometimes they give out notices when it's going to happen. Random stuff still happens because of the sun. The sun is a very scary thing if you are studying it everyday but most people don't know it.


So an eclipse…of sorts


In this case, transit is the right jargon.

A transit is when the foreground object doesn’t completely hide the background object.

An occultation is when the foreground object hides the background object.

These two situations are collectively called occlusions.

An occlusion is an eclipse if the the observer falls in a shadow.

https://en.wikipedia.org/wiki/Occultation


This is like a real-world equivalent of the "cleaning lady unplugged the machine" urban legend.


I was going to say it was the real world equivalent of, like, Redwall, or one of those other fantasy books where an event happens once a year when the sun shines in exactly the right spot to illuminate some secret writing.


There is the Anthem Veterans Memorial that projects an image on the ground on November 11th at 11:11 and at no time else.

https://en.wikipedia.org/wiki/Anthem_Veterans_Memorial


Except for when it is visible at other times, as that link explains?


Isambard Kingdom Brunel built a rail tunnel in England which has the sun shining straight through it on his birthday


Indiana Jones, Raiders of the Lost Ark.


The hobbit


Witcher


I know someone who had that happen to a server he was directly responsible for (well, someone unplugged it anyway, might well have been someone else in the office - it wasn't exactly a proper server room)

It was not just that the server crashed, but the ventilation proved to be bad enough that one of the SCSI drives (which should have been in a RAID, but wasn't...) wouldn't start.

They ended up opening it to try to kickstart it manually (been there, done that myself; had a drive survive 6 months with me "helping" the motor spin it up every morning; yes I backed everything up very regularly during that period), finding the drive head had gotten stuck to whatever material covered the plate. They ended up putting the drive in an oven while connected, and heat it until it spun up, and which point they dumped what data they could.


I do have a memory of having to bump an old heavy drive to get it to start spinning. :-)


I opened mine up, so I'd start rotating the platter by using my finger to start spinning the centre.

Of course this was with a 20MB hard drive - the sensitivity to everything from dust to alignment changes was magnitudes different from the internals of modern drives...


I thought it was an urban legend and then somewhere in 2005/2006 I first spoke to one guy who claimed it was a bank in Tønsberg, Norway.

Later I worked with a IT manager who also confirmed it was such-and-such bank in Tønsberg. I have forgotten the name of the bank but I am still on friendly terms with him so I could ask next time I see him.

(My first draft of this post said that the first bloke had claimed to be in the room, but this is 15 - 16 years ago and the next story is also close to a decade ago so I might have mixed up who said what.)


I knew someone who had it happen at his office either ca. 1995, in Oslo.

It was very much not a proper server room, which made it more understandable that they'd not realise there was stuff there that shouldn't be turned off.

I suspect there's been plenty of real-world incidences of stuff like this, and that many have never been noticed.

I've certainly pulled the wrong cable myself a couple of times over the years, even after thinking I'd been very careful, and being aware that I was dealing with servers that should stay up.


Maybe she overheard they need to sort out bugs and she was like "bugs? in a computer? not on my watch! grabs hoover".


Something similar actually happened somewhere I worked.

There was an outlet in the hallway, right outside the glass window looking into the server room... you guessed it -- plug in a cleaning machine, blow the fuse, take down servers...


It is reported in the book Absolute Zero Gravity, about having happened in a military context, while others here mention a bank: the only reason to believe it may have happened so infrequently that some call it a urban legend is that in the past that infrastructure was rare. The rest - failures in the workflow of instructing and monitoring "innocent" personnel and contractors - is an "overly" normal factor.


I had a root cause once that was a cable draped across the corner of a ventilation duct.

The duct would vibrate when the air was on, and the corner was pretty sharp, which caused the duct corner to 'saw' its way through the cable's insulation over time.

Took a while to isolate the problem to 'its between this box and this box' but was a pretty quick find after that :)


Our garage door opener has this problem. At very specific times of the year the sun confuses the infrared blockage sensor. The cause occurred to me when I lined up my eye to see what the sensor was seeing when it was failing, and I noticed the morning sun right next to the other end of the sensor stream. I moved a trash can to shade the sensor and it worked fine.


I had the same problem with my garage door opener. Tried shading the sensor but it still didn't work reliably. Finally replaced both the sensor and transmitter and it has been good ever since. My theory is that the lens got dirty or scratched over the years and was picking up stray light from odd angles.


+1, I had this issue as well and went through a bunch of futile efforts to block the sun and the angle and wish I had replaced the transmitter a lot sooner instead of trying to move trash cans because that worked 50% of the time. I think this is the kit I bought: https://www.chamberlain.com/safety-sensor-kit/p/041A5034


On sunny days our garage door won’t open until the remote is very close. Either the solar panels or inverter generate enough noise to interfere with the signal.

A few months ago our garage door openers started working like normal again. That was great until we realized it was because the inverter had failed. When the inverter was replaced, our door problems started again.


Damn thats one electrically noisy inverter. I wonder how it passed EMC testing in the first place.


Just switch the sensors. Move the light source to the other side of the door, and the receiver to the shade. :tapstemplegif:

(That's how I fixed mine.)


Then the evening sun could potentially cause the same thing. There's a gap in houses across the street that may allow sun in at the right time of year.


If I remember rightly, this incident was the motivating factor for adding "save failing images" capability to our software.


So much time and money wasted by not recording the camera. Could've simply reviewed the footage from the right timestamp and immediately discovered what was wrong. All you had to do was take a still picture every time the system makes a rejection.


Looks like a cool solution, when issue is known )


In hindsight everything is easier, sure.

But if you have a problem only happening sometimes, then you surely want all the data you can get from all sensors recorded, so looking at the saved video ofthe error time seems a nobrainer, but maybe it was not so easy to make them recording something. We do not know the setup.


> The technician rushed over to the equipment and discovered that the sun was shining at exactly the right angle to cause a lens flare in one of our cameras.

Somewhat infamously, "a rare alignment of sunlight on high-altitude clouds above North Dakota and the Molniya orbits of the satellites" the Soviets used for their nuclear attack early warning system triggered a false alarm, which, had it been treated as a real situation, could have lead to nuclear war in the early 80s.

https://en.wikipedia.org/wiki/1983_Soviet_nuclear_false_alar...


This is something that I've also heared from hot axle box detectors for trains. Their solution: plant a tree.


Smort, trees are great for managing heat. I do hope they documented somewhere that the detector (or the thing it observes?) needs to be in shadow though.


Bughenge.


My exact thought. Please tell me this was filed as the Stonehenge Bug.


Being inside a building you can use Abu Simbel bug too.


preemptivly storing images from camera of rejected samples would have saved everyone more time, as it would be enough to review images of failed samples and notice a flare. of course if that was an option.


If I remember rightly, this incident was the motivating factor for adding "save failing images" capability to our software.


like stonehenge, but with industrial sensors


Industryhenge?

Amazing story!


Indiana Jones discovering the location of the Ark.


The phase of the moon really can affect performance. A friend of mine worked on wireless links in Scotland and was struggling with loss at certain times of day, but not exactly the same time every day. When they graphed loss against time, the pattern was really periodic over many days. The periodicity turned out to be 12 hours 25 minutes, which they eventually realized is exactly the time between low tides. The problem was at low tide the reflected path off the water interfered with the line-of-sight path causing signal fading, whereas at high tide it interfered much less. In particular, see figure 2 of their paper for the correlation between tide height and SNR: https://homepages.inf.ed.ac.uk/mmarina/papers/EDI-INF-RR-136... As tide height really does depend on the phase of the moon, presumably their loss did too, if they measured for long enough.


I heard a story about an astronomer loosing the chance to be the first to report a commet one cold winter night - just as he wanted to send the email to report it, the Internet connection was dead! He ran from the observatory to the nearest place with Internet connectivity, but by the time he sent the email from there, there was already a report from another astronomer elsewhere, a few minutes ago.

Reason for the mysterious network outage ? Thermal contraction! The observatory was connected to the Internet via an optical link to a highrise building in the city that contracted ever so slightly due to the very low temperature, moving the laser beam of the optical link out of alignment, shutting down the connection.


When I started dating my wife she lived in a shared student residence. She would tell me that her internet didn't work when her room was cold. She had to wait for her space heater to heat up the the room. I didn't believe her at first but it turned out be true. She lived there for a year so I observed it several times. Never did find out what the problem is. I don't know if thermal contraction could effect solders between ~10C -> 22C.


Yes, and not at all unusual. The gap between the ideal world of textbook circuits and the messy and random world of real circuits is a source of constant distraction and surprise.

Thermal stresses, RF interference, issues caused by lighting, rain, and condensation, bad solder joints, other kinds of temperature sensitivity, designs with marginal component tolerances, component drift over time, vibration sensitivity, strange failure modes in damaged ICs, timing issues, power supply noise, coupling between circuit tracks and/or adjacent wiring, logic errors due to radiation (sometimes from surrounding metal), dust, insects, and animal damage - and on and on.


Yes. Not uncommon with bad soldering jobs. Problems like that can be diagnosed with freeze spray if necessary.[1]

[1] https://www.techspray.com/using-freeze-spray-to-diagnose-fau...


Some electronic parts are not rated to work below 0° C - for ‘commercial’ equipment the range is usually 0° to 50°C, and you often have to pay extra to get ‘industrial’ or extended range versions of parts that have ranges like -45° to 85°C (some fairly common parts you can even get to 105°C).

It it’s failing at 10°, I’d suspect there is a crack in a PCB somewhere or a badly soldered joint that isn’t actually making proper contact until there’s some thermal expansion. We don’t do anything special with the PCBs or soldering specifically, and all the stuff we make is tested down to -35° and up to 55°C with no issues…


We had it on a copper cable line in university. The line was hung overly-tight without slack (old job).

When it got just cold enough in the winter: no connection. Warmed up: connection.

Finally realized it was the line, got Comcast to redrop from pole, and situation was resolved.


one of the reasons why free space optics gigabit and 10GbE ethernet laser based links are a tiny niche product in the ISP space, millimeter wave (71-86 GHz) fills the purpose much better.

also they don't do as well in rain as something that can adapt modulation.


Yeah, that is likely why terrestrial FSO seems to be mostly dead these days as far as I can tell. Even back then there used to be a backup RF link in case of fog - why not have a proper RF link instead, especially if its going to be much better due to all the money dumped into RF technology in comparison to free space optics.


In college we had a transmitter that linked several campuses.

If internet went out on any campus we would go and replace the lightbulb left on the transmitter. A couple minutes later all would be good.


For those who are interested, this is why you usually have two dishes/aerials, vertically displaced, so that when one has destructive interference between the direct and reflected signals, the other has constructive interference. I learned something about this when writing data compression and encryption software for radar surveillance systems, where there were multiple radars over a moderate coverage area, all sending data via microwave links over water back to the Command and Control Centre.


in modern point to microwave systems, unless the budget is really high, or the path is very long, it's rather unusual to have a vertical spatial diversity setup. much of the problem of losing link due to fade is accommodated by modern radios that have advanced variable modulation and FEC, which can operate anywhere between 4096QAM 5/6 and QPSK 1/2.


Interesting, thanks. My information is dated, but also it was a project with some specific requirements. It's possible that our client opted for the vertical spatial diversity setup because of that.

I'd be interested to know what you think is a very long link ... 10km? 100km? I can't say too much more about where we were.


Beyond 35-40 km is where you would start to see atmospheric ducting, temperature inversions and such really affect a link. In the bands where it pretty much has to be 6 GHz because anything 11 GHz or above (in the FCC licensed bands) would suffer extreme rain fade with the rain rate and amount of rain in the total length of the path.

Generally things where you'd be looking at a pair of 6 ft high performance dual polarity dishes at minimum, and ideally a pair of 8 ft. The budget for just a couple of those and getting them properly mounted and aligned is a lot already before adding a spatial diversity second dish at each end.


We were mostly dealing with 5km to 10km links, and it was some time ago. There were other considerations as well, so I didn't get into it too much ... not my field, the peeps doing the work seemed to know what they were doing (both customer and supplier), and I had other things to worry about. Extreme real-time data compression is interesting when you're on a lossy link.

I don't think budget was a problem for this customer, getting it right was important, but useful to know about the options and constraints ... thank you.


> When they graphed loss against time, the pattern was really periodic over many days. The periodicity turned out to be 12 hours 25 minutes, which they eventually realized is exactly the time between low tides.

I've set up a couple of monitoring systems at a couple of different companies and one thing I've heard some people saying is that they don't care about "fancy graphs", they just want a dashboard of what is red and what is green.

This might be a manager vs engineer perspective, because for me the graphs are the main point: it allows me to spot

- patterns (each night, each weekend, some weekends, more-or-less-randomly-except)

- and also trends: at this speed we are going to reach 80% utilization before November.


The key here is to describe and explain what/why to look for and how to do it. Don't leave it to the Client/Customer to struggle and figure it out on his own.

I have often found it surprisingly difficult (in spite of being an Engineer myself) to read and interpret the various graphs in monitoring dashboards when i don't know what i am looking for. This is ten times harder for most "Manager" types.


I remember reading that paper when I was trying to figure out why we were having issues with a wireless link down in the Patagonia fjords.

Unfortunately we didn't have the hardware or enough control over the link (it took negotiating access with armed forces to work on either end) to try to implement any of their ideas.


Cool result. That figure 2 is begging for a scatter plot of SNR and tide level to see how well correlated they are.


The moon is a nightime light source, and a pretty good one at that, every 30 days or so. Even after the invention of the light bulb it continues to light up the night. Thus it's not astrology to suggest that the phase of the Moon could affect things on Earth seeing as how it's what causes tides. (It is astrology to suggest the Moon is causing an effect based on magic though).


point to point microwave link sounds like the bottom part of the fresnel zone was scraping the water - bad engineering design from the outset, tides or no tides. Not a good idea to do unless you have absolutely no economical way of getting one or both ends of the link higher.


I heard a great story a while back for a digitization project where historic content was being provided by many libraries around the world, including one in Russia.

The quality of the scanned books was excellent, except for a weird distortion every so often where part of the page would be shifted partway through as if someone had shifted half the page in Photoshop. This was only noticed in books over a certain size so people were checking to see if there was some kind of mechanical problem with the scanner (these were robots with automatic page turners so it was plausible that there could be something which was only an issue past a certain position), trying to figure out of there was some way that the software had some kind of memory leak or other issue which would explain the long and inconsistent intervals.

Eventually they were on a long-distance phone call to Moscow and not turning up anything when there was a loud rumble in the background. “What was that?” lead to the realization that the library's scan center was close to a subway tunnel. The vibration of a passing train was enough to cause a glitch but only if you happened to be scanning at the exact time it went by: the reason longer books were noticed was simply because having more pages meant that at any point in time a long book was more likely to be sitting in the scanner and the technicians running the scanner were apparently tuning out the trains as background noise. This was reportedly the first project they'd done with one of the scan robots which can process an entire book unattended so it was plausible that smaller past projects simply hadn't been scanning frequently enough to hit this problem or that some previous technician had noticed and immediately redone the page.


This is why you see sensitive imaging equipment in labs on air tables. Sometimes the entire room is an air table.


Not a native speaker so I didn't know what an air table was. But since I've worked in labs, I recognized it from the picture accompanying the Wikipedia article:

https://en.wikipedia.org/wiki/Optical_table


Indeed — they're not cheap but if you need that, it's worth the cost. In this case you could probably avoid it simply by having the robot rescan the page if it detected a vibration but the research labs I used to support had experiments where that would have jeopardized months worth of work.


I deal with this all the time at work. People are capable of tuning out frighteningly obvious things, if they happen with enough regularity for long enough.


I thought that this was going to be different story.

There was a program I heard about back in the 90s which would literally crash depending on the phase of the moon!

The story is that it wanted to print a date. The programmer happened to have an astronomy library available that gave a string containing the date. So the programmer called that, and then parsed out the date.

Unfortunately the astronomy library wrote its result as a string to a point. The result included the phase of the Moon. The pointer was not declared to be long enough. And therefore, would crash if the name of the phase of the moon was too long!



This reminds me of something I lived through as a nerdy teenager working a summer job as first line IT support at the headquarters of a multinational, in the mid-90s...

One day, I started receiving calls (through my pager!) from rather many people about intermittent networking problems. The state of the art 10mbit wired UTP network would have frequent bursts of 90% package loss.

What was weird: only people on the fifth floor would have this issue..!? Our first thought was that they were on a single hub/switch that might have broken. But no, they were connected to the same uplinks as the computers on the problem-free surrounding floors. Furthermore, laptop users (who were of course also wired at the time) were reporting no problems whatsoever.

We were pretty much out of ideas by that point, but did an experiment just to test our assumptions: we took a PC and hooked it up with a long network cable and a power extension cable on the fourth floor and started pinging it. Flawless. Then we started walking up the stairs, and, yes indeed, somewhere around halfway up the stairs packets started to drop. (But not at all times, sometimes it would be fine, like all PCs on the fifth.)

If you want to guess at the cause, this is your chance. :-)

We brought in a company specialized in EM interference. It turns out that a GSM antenna placed on the roof of the four story building opposite to ours about half a year ago, had just been turned on. Its height aligned to our fifth flour. Whenever someone was using this mast to make a call (which certainly wasn't all of the time back then), it would cause interference on a specific model of network card that we were using in all of our PCs. It had a relatively large metal component that was apparently a pretty good 900 MHz antenna.

When confronted, the mobile operator quickly adjusted the antenna to not be directed at us. I believe all network cards were replaced soon after. Fun times!


> Not strictly the cycle of the moon but close.

Meh. Just the old 49.7 days cycle that it takes to overflow 32 bits when measuring miliseconds.

I was hoping for a "it works when I buy vanilla icecream and doesn't when I buy other flavour".


I was also hoping the title was more accurate. In the lines of the famous story of not being able to send an email 500 miles.

https://web.mit.edu/jemorris/humor/500-miles


The end of that article is a good reminder of the units command, I've been using Google for units conversion for so long that I forgot about the standalone units. It even comes standard with OSX.


I love the GNU units program so much. I think I use it at least 4 times/week. It's useful for kitchen conversions and also quick nuclear fuel burnup calcs. For example I used it on this blog post covering the long-term sustainability of nuclear fuel resources on earth.

https://whatisnuclear.com/blog/2020-10-28-nuclear-energy-is-...


> It's useful for kitchen conversions and also quick nuclear fuel burnup calcs.

Also electrical engineering and estimates of the feasibility of brute-forcing cryptographic primitives using the mass of the observable universe as fuel.


Interesting kitchen you have. Calculating nuclear fuel burnup.

I wonder what's cooking? Does it glow in the dark? ;>


> I wonder what's cooking?

Yellow cake.


Is there an equivalent command for doing things like 147 days from today or days since 21 jun 2021 or 88 days after 15 Aug 2021? This is the one thing I really wish was in Spotlight (I use spotlight for unit conversion and calculations which is really handy).


WolframAlpha works well for those types of calculations.


With GNU date (but not the BSD-based macOS date, I believe), and not all that convenient for "days since":

  $ date -d 'today + 147 days' +%F
  2022-02-23
  
  $ echo $(( ($(date +%s) - $(date -d '2021-06-21' +%s)) / 86400 ))
  100
  
  $ date -d '2021-08-15 + 88 days' +%F
  2021-11-11



To clarify, I was thinking a quick command line thing. Right now, I do it in a browser (both DuckDuckGo and Google produce the answer easily enough).


The unix date command does this.


Well, GNU date does, BSD date may or may not.

Linux box:

  $ date -d 'now + 4711 days ago'
  Thu Nov  6 06:57:20 UTC 2008
  $ date --version
  date (GNU coreutils) 8.30
  Copyright (C) 2018 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later 
  <https://gnu.org/licenses/gpl.html>.
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.

  Written by David MacKenzie.
Mac:

  % date -d 'now + 4711 days ago'
  usage: date [-jnRu] [-d dst] [-r seconds] [-t west] [-v[+|-]val[ymwdHMS]] ... 
            [-f fmt date | [[[mm]dd]HH]MM[[cc]yy][.ss]] [+format]
I also seem to be unable to pull a version out of the Mac's date command.


I use the Python datetime module for stuff like that. Excel can be useful too, since it treats dates as a number of days.


I always thought the problem was obvious by the title. And felt good about myself for a long time after I read it. Now that I am well into adulthood knowing does not seem that amazing.


Thanks for sharing, this is a great read!


Yes and:

> Just the old 49.7 days cycle...

I've encountered datetime bugs and learned to take preventative measures.

I generally add a virtual clock shim to my projects, eg wrapping System.currentTimeMillis() or equiv.

Then I write unit tests for anticipated edge cases. Like midnight, end/start of year, etc. To ensure reporting, rollups, logging, grooming, etc. are working correctly.

Also allows me simulate elapsed time, so I verify out of order event processing and so forth.


I used to work in aerospace. One of my projects involved running avionics bench tests at a customer facility, basically the avionic subsystem of the aircraft in a big room on shelves. We were using a laptop for data logging and started getting dropouts in the data every 5 minutes. This was worrying because a) this hadn't happened at our site on similar equipment and b) this was a final customer-facing check before doing a real test flight.

We spent about a week trying to debug the system and the software and at a certain point while I was just sitting and thinking about what to do next, Flying Toasters popped up in the data logging PC (the lid was normally closed because of the space on the bench).

The Windows screensaver was hogging so much CPU that the datalogger couldn't keep up.


Tangentially related, here's another fun bug that inexplicably cares what time it is: Open Office cannot print on Tuesdays (https://bugs.launchpad.net/ubuntu/+source/cupsys/+bug/255161...)


A bit unrelated, but another fun one, "We can't send mail more than 500 miles": https://web.mit.edu/jemorris/humor/500-miles



That story disappoints me a bit in that they never found the cause.


I love the 500 mile email bug! I read another one years ago about a server that would inexplicably crash, and it turned out to be due to stepping on a particular floor tile in the server room. I wish I could find that one.


Reminds me of a story where a company's internet would regularly drop at the same time every day -- let's say 3pm.

Nobody could figure it out so they called in an expert.

After lots of attempts and figuring, one day the person in question happens to look out the window at the time in question ... and sees a service truck park exactly in line-of-sight between the business and their internet-signal pickup broadcast point.

Ah ha!


i once lived in an apartment in an old building where if you turned on the light in the bathroom the dsl would lose sync.

why this would occur i'll never know. (probably old telephone wiring wrapped around old 110v wiring? maybe? or who knows what kind of weird leakage/ground loops may have existed)


Quite a few years ago, I spent many hours making my way up the support tiers at Time Warner in Brooklyn to resolve some connection issues (cable modem). I patiently waded through each tier as they repeatedly asked me to reset my modem and my router and restart my computer and check every cable, etc. The same things I'd already done before (as an on-site tech myself, at the time) and as the previous support person asked me to repeat.

Finally I made it to tier three, with someone who seemed obviously competent. Within about a minute, he checked the power usage on my modem and then historically, and knew immediately that if I moved my modem to another outlet, it would work.

It did. Never had that type of connection issue again.


This happened to me as well, only with DSL. The strip lights on the bathroom mirror (newly installed) would disconnect the internet when turned on.

Such a weird thing to troubleshoot when you have a few people living in the same house.


cheap AC to DC power supplies are very 'noisy' in the RF spectrum.


At a very former workplace, we suddenly lost the ability to ARP between two buildings. Once the ARP entry was in the machines, it worked fine (FSVO "fine"), but getting to the point where ARP worked simply was not reliable.

Much troubleshooting later, it turns out that when they'd been doing some maintenance in the lift shaft (which was also used to drop the inter-building 10Base-5 yellow snake), they'd managed to shoot a nail through the Ethernet cable and we now had a nice 50Hz hum on the cable.

Retries in TCP made that work, but ARP doesn't have retries, so if that managed to get faded out, you'd hope to get lucky next time...


> Bugs based on a time calculations can often show themselves later when view through a longer lens and scope of time...sometimes WAY longer than you'd expect.

When I worked for BBN in '97-'98, someone from outside the company as I recall came to talk to a room of engineers about the wide variety of calendar-related behaviors in various UNIX systems that were expected to cause problems for Y2K.

It was a very, very long list, often subtle issues, and I recall the concern in the room about the number of old systems in use by the DoD and others.

Anyway, no real point to this other than date handling is one of the hardest things to get right in computing, ranking right behind testing for the correct behavior.


Dates are a pain.

The date bug I committed with the longest tail was daylight time. It was all good until we got to a day with 25 hours when we "fell back."


You are most definitely not alone with that exact bug. Welcome to the club! :)



Be happy you didn't have to deal with week numbers. IIRC, a Norwegian bank ended up having an outage because part of the code could not cope with the 53rd week in a year.


> What an interesting and insidious bug! Bugs based on a time calculations can often show themselves later when view through a longer lens and scope of time...sometimes WAY longer than you'd expect.

My personal anecdote. I like playing online games, and as you know latency is the killer. I enjoyed playing in the evenings after work, and inexplicably I started noticing my latency spike from around 50ms to > 1s. Extremely frustrating.

I had no idea what caused this so I set up a simple ping command and had it save it to a graph.

Well, the next day I noticed the pings were steady throughout the whole day, then in the evenings I'd get these chunks of bad time. It turns out when my wife would watch Netflix in the other room (and it was only Netflix), it'd cause something to go awry with the router and latency would spike for me. (The really weird thing was that it was a combination of a Roku, Netflix, and a wired switch - change any of those and the problem went away).

Later during the pandemic, I also diagnosed drop-outs on my network due to kids in my neighborhood being online during school hours. Like clockwork I'd get a bad network from around 10pm and it'd be fine ending around 3 or 4. On school holidays and weekends my network was fine.


In the analogue days, before pixels existed, a customer had trouble with their phone line not working when the moon was full.

The problem was that they lived on the coast, and a subsurface junction box would get wet during king tides, causing the telephone line to fail.


F'n A. Reading the comments on this thread makes me love humanity. So much ingenuity, raw engineering horsepower, creativity. Goddamn, you are great people and you should be proud of yourselves. Reading this makes me believe we will survive as a species.


you'll appreciate this:

I worked with a factory that spent several years tracking down a quality problem. Eventual cause was wind direction...whenever they were down wind of the local cattle stockyard during a hot day.


“I don’t know, sounds like bull shit to me.”


> Did you know (I know because I'm old) that Windows 95, for a time, was unable to run longer than 49.7 days of runtime?

Yep! Because foone did a whole month-long Twitter thread on it, even had a livestream showing the crash.

https://twitter.com/Foone/status/1413694652822163459

https://www.youtube.com/watch?v=Hb46tX7-d_o


At a previous job, we used a bug tracking tool called "Remedy". On September 8, 2001, it started reporting dates incorrectly; its idea of the current date jumped back to 1973 and started advancing at 1/10 the normal rate.

It used Unix timestamps (seconds since 1970) and assumed they could only be 9 decimal digits. When the time reached 10 digits, the last digit was quietly dropped.

(It was fixed within a few days.)


I'm immediately reminded of NetHack [1] (which you can play online here [2])! Real-life phase of the moon has a small but important effect on gameplay in NetHack. The public game server on alt.org even tracks the phase of the moon to provide a handy reference.

A little bit disappointing to discover that the code from the article does not actually depend on the phase of the moon. I'm really interested to see the other stories here where it actually is the case that the phase of the moon is affecting people's code.

[1] https://www.nethack.org

[2] https://alt.org/nethack/


Slightly related, I once wrote code for the old Kill Screen site that subtly altered the graphics of a review of a game themed around lunar stuff based on the current phase of the moon.

God bless gusts of random math people leave about.


Now, this is a true phase-of-the-moon bug: http://ftp.informatik.rwth-aachen.de/jargon300/phaseofthemoo...


Funny enough, Nethack[1] had this implemented too (w:spoilers). See also wmoonclock[2](wmaker) for a nice "moon-clock".

[1]https://nethackwiki.com/wiki/Time

[2]https://www.dockapps.net/wmmoonclock


I've resolved several "celestial body" problems with routers and modems in East/West Africa over the years by pointing USB fans at them – between the sun and the workday generating heat with higher load in lower-end routers or insufficiently-air-conditioned units, can work surprisingly well to improve the network at almost no cost.


Neat Trick!


I have seen interference from one part of set top box to cause noise on flash input lines and sometimes issue a command to clear flash on the device.

Months of debugging, dozen people involved, tens of thousands of devices bricked, tens of millions lost.

All due to a single line of code that configured flash to not require special magic before each command. This feature made to improve resistance to interference also hindered performance. Somebody thought it a good idea to disable to get some points for improved performance.


Once in a while my PC performs several contiguous execution of a key pressed on my Bluetooth keyboard. At other times, it ends up missing some key presses. Upon investigation, I've discovered this usually happen when ceiling fan is running :).


As long as you don't traumatize the hard drives by yelling at them, feat. brendangregg & bcantrill:

https://youtu.be/tDacjrSCeq4


Tangentially, my last company made sophisticated routers, dissipating ~3kW of heat. We had hard drives in them for persistent logs.

There was a big problem where we needed to upgrade the fans to deal with the heat dissipation, but it was destroying the performance of the spinning disk HDDs due to the vibration of the fans.

(these were 2U devices with 5 boards: 2 control-plane boards (1 active, 1 on stand-by for redundancy) & 3 data-plane boards (2 active, 1 stand-by))


So I'd say the obvious solution would be suspending the drive on relatively big pieces of rubber, like https://m.media-amazon.com/images/S/aplus-media/sc/a32b42cb-...

If it was a big problem, that must not have been viable? Too cramped?


Exactly, there was no room for the rubber dampeners that were the obvious solution, and much frustration from the engineers that this wasn't planned ahead.


Sometimes the day star lines up directly behind your satellite or microwave dish and you have very poor snr for a few minutes.


Is there a database or "online myth/story" archive of wacky bugs like this and in the comments? These would make for great "cocoa at night" reading!



I'm bookmarking this thread on pinboard because of all the good pointers here.


Add this to your list: https://www.reddit.com/r/sysadmin/comments/9mk2o7/mri_disabl...

spoiler: Helium messes with MEMS oscillator, causing iPhones to stop working (the clock signal is basically flatlined)


Done, thanks. Will have to read it after work.


I worked for satellite television. One of our servers would freak out once a year. It was found that the actual satellite was in line with sun at that time causing large amount if power sent over cable close to the server due to need to use backup antenna.


Neat Thread!

What i love about these problem-solving anecdotes is how a seemingly totally different domain is the key to the solution. It always makes me marvel at how interconnected everything in our World is. Strengthens my belief that "Cross-Disciplinary" knowledge is where "Wisdom" lies and is the key to our Future.

“From a drop of water, a logician could infer the possibility of an Atlantic or a Niagara without having seen or heard of one or the other. So all life is a great chain, the nature of which is known whenever we are shown a single link to it.” --- Sherlock Holmes in A Study in Scarlet


I was expecting to see at least one joke along werewolves... (soft ware wolves... they turn bad on full moon)


Reminds me of the story where emails couldn't be sent more than 500 miles away: https://web.mit.edu/jemorris/humor/500-miles


I love/hate such bugs. Often hard to debug, but totally great to finally find the root cuase and a solution. I've once got a ticket stating "this script always crashes before 10 a.m." [1]

[1] https://darekkay.com/blog/script-crashes-before-10/


See also a case where the hardware worked differently due to the moon https://www.nytimes.com/1992/11/27/us/moon-is-blamed-for-bli...


At my work they had a server outage every so often, where it seemed to just go down (back in the 90s). Turned out the cleaning lady came in the evenings when there was no one there, unplugged the server before plugging in her vacuum cleaner. And then plugging the server back in.


My office internet drops out on my computer every afternoon at about 3pm....doesnt affect any other computer, just mine. I am getting Father Merrin in next week to give the place a proper going over.


I hoped this would be an article about nethack.


I was expecting that it really had something to do with the moon, though.


> The code worked differently when the moon was full

Sound like a case of werecode


Great service poor business model.



Signed integers should be used as sparingly as floating-point. They should not be used in ordinary code because ordinary code has no use for them until they break something.

The most notable exception would be languages which allow negative indexing, but IMHO if that were syntactic instead of relying on actual signed integers, it would be safer (I.e., [- $int] would be a different syntax from [(-$int)] and the latter would not be correctly typed.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: