Furnace controller product - optimized energy consumption in a steel recycling furnace. Customer reported the furnace got 'stuck' every couple of hours, the electrodes quit moving so it continued to burn electricity (megawatts!) but wasn't melting anything.
Turned out it was the 'power saving' setting on the PC. To save a couple of watts it slowed down the CPU, and screwed up a furnace that burned megawatts. Probably wasted more electricity in a day than the 'power save' feature saved in the whole US for a year.
One of my first professional projects out of school was a client/server system using WinForms. I wrote the server, another guy wrote the client.
We worked hard and fleshed out a good design quickly. We iterated a few times, then released. We soon got a major bug report from the field: the server would lock itself up after some time! I was baffled; I didn't write much in the way of threading code. I pored over the entire codebase several times looking for any possibility of deadlock. Nothing.
Finally, I went out to the customer site when it happened again. Someone had selected text in the console window the server was running on, which effectively halted the process. I wrote a small wrapper around the server that launched it as a service, and that fixed it.
(I probably should have written it as a service originally, but I was fresh out of school, so I instead took it as a hard knock.)
Yeah, I guess this experience made me question the whole idea. If you add feature complexity to a machine for some small savings (desktop with powersave? really?) it might cause more trouble than its worth. In this case, it certainly did. People make mistakes.
Consumer technology is designed for consumers. It might also be the case that in the long run, only consumer tech is economical, but designers and implementers must take care when they attempt to build robust systems from compromised components.
That is, it is perfectly sensible that an OS for a desktop PC would incorporate powersave. If this scenario had caused spacecraft to go off course, we wouldn't be blaming Microsoft.
Blame doesn't matter. The fact is, the PC powersave feature used more power than it saved, that year. Arguably the complexity of the OS was a critical factor in that.
...the PC powersave feature used more power than it saved, that year.
I think you're underestimating the number of PCs in use "that year", but even so, accounting doesn't work like that. Microsoft don't implement features for the sake of the total world power supply, but rather for the benefit of their customers. For the vast majority of PC owners, powersave is a (slightly) valuable feature. For at least one owner at least one time, it was a (very) undesirable feature. Which set of customers made Bill G a billionaire, and which is a rounding error?
Newer Intel chips have this feature as well, where the CPU itself will underclock if not under load. Which is fine, if it's documented and the OS is properly handling it, and turning it off...
It's a problem when this doesn't happen, especially if it impacts your timers.
Personal craziest was an Oracle performance problem on Windows NT in the 90s. Slow as hell. Going to the server, logging in, checking everything: Blazing fast. Back at the desk, slow as hell. Problem was the GL tube screen blanker with software rendering :-)
My craziest one turned out to be caused by World War 2.
Bug report from a bank said that a customer's birth date was not accepted when trying to open an account - they'd tried and found that any data within a range of about a month in the summer of 1945 was not accepted. This was a German bank, and the application was written in Java.
I could reproduce the bug and found that the date was rejected at a very low technical level in the Calendar class (long before any domain validation happened), just as if you'd entered the 30th of February. Some debugging sessions later I found that the Calendar class calculates a lot of internal date and time fields, and the daylight savings time field containd a value of 2 hours, which was rejected by internal sanity checks.
The name of that field led me to a Java API bug report which explained everything: The Locale for Germany is "centered" on Berlin, and in the summer of 1945 Berlin and the Soviet-occupied zone of Germany actually did have a 2 hour daylight savings time (which happened to be identical to Moscow time). Some smartass in the Java development team had "corrected" the sanity check in Java 1.4 because he believed 1 hour DST to be the maximum - but Berlin is in fact not the only timezone which had a 2 hour DST at one time or another. The bug was fixed in Java 1.5
Times and dates are such a minefield for developers. I really came to appreciate this about 10 years ago when I dove into the Python datetime module documentation to fix a 'simple' problem and saw how much work went into getting things right. Combine that with how much of it comes up on RISKS (comp.risks) and I try not to ever take dates and times for granted.
We had a crazy Windows NT 4.0 + Lotus Notes bug where something in the window manager leaked memory (only when Notes was running). Maximizing and minimizing the Notes window would recover the memory. If you didn't do that, the OS would eventually hang. This was a public-facing server -- why we were running Notes for that is a whole other story of sadness and despair.
Anyway, eventually my colleague wrote a small VB program that maximized and minimized the window every few minutes. Cured the problem.
My craziest bug was a Java clock issue. If you had some specific model of motherboard, calling System.getCurrentTimeMillis() repetitively could actually make your system clock run faster. I mean, actually CHANGE the system clock. For real. Like 10% faster. That led to veeeeery interesting issues related to timing on the game I was working on, and of course it me took days before I would even think that my problems could be caused by the time actually running faster on different machines.
Oe the site comes back up the author has a phenomenal analysis of the perf events local root exploit from over the summer. If you're interested in exploit development or just security bugs in the kernel his analysis is great. He takes the time to explain things at a level that even the worst cs student could understand.
Turned out it was the 'power saving' setting on the PC. To save a couple of watts it slowed down the CPU, and screwed up a furnace that burned megawatts. Probably wasted more electricity in a day than the 'power save' feature saved in the whole US for a year.