Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The cause of the Zune leap year bug has been isolated to a Freescale date routine (zuneboards.com)
32 points by divia on Jan 1, 2009 | hide | past | favorite | 23 comments


The thing that kills me about this bug is that it's such an easy bug to find in a code-review. Some person totally unfamiliar with the code asks "What happens when that inner if condition is false?" and MSFT is saved from yet another embarassing gadget glitch.


What kills me is the fact that they have a while loop on the main thread of the Zune, which wasn't tested for all possible values of input. I don't know about anyone else here, but as an embedded software developer, that sort of thing automatically sets off alarm bells for me. Such a situation would definately be code-reviewed, and heavily unit tested in a sane development environment.


If stories are true, Microsoft has some the most code-reviewed software in the world. The levels of bureaucracy they have got to go through are tremendous.

First-generation Zune software was so bad that it made iTunes look good. On the other hand, there was a seismic rift between first-gen Zune software and second-gen. The new Zune client software has got one of the nicest, prettiest UIs on my system.

I think that in order to generate the second-gen Zune software, a lot of the bureaucracy must have been chucked. Apparently too much. It's really too bad. I suppose that we have will have to make hard choices between stability, features, and beauty forever.


Bet there's a developer somewhere having a really shitty New Years eve right now...


Bugs happen. The root cause of this disaster is an incomplete test plan.


There is no such thing as a complete test plan.


But isn't this exactly the kind of thing that should be in a test plan? Run through all the possibly dates?

I mean, didn't we learn our lesson from Y2K?


Sure thing. We'll run it through all possible dates.

Well, you know, some actions may only have problems on certain dates, so make that all possible actions on all possible dates.

Well, you know, maybe the meridian counts. So, make that all possible actions on all possible dates in both the morning and the afternoon.

Well, we better test that with all the pathological media files we've built up. So, make that all possible actions on all possible dates in the morning and the afternoon with every one of our three hundred test music files.

Well... what if...

In hindsight it's always easy to identify the test that would have revealed the problem. Given the ability of bugs to only manifest when five particularly tricky conditions occur (and let's not even TALK about race conditions!), it simply isn't possible to test them all exhaustively. Unit testing helps, sort of, because it can run through a matrix faster than a human can, but it's also dumber, and even unit testing can't exhaustively search a twelve-dimensional space in any reasonable period of time if it's anything more than binary values in those twelve dimensions.

That said, for date processing code, this is definitely something I would have unit tested; that style of date processing is OK if you get it perfect, but definitely inherently problematic. One case where I did write a fairly exhaustive unit test was for billing code with a variable bill-on date; getting every single case exactly correct took me a long time. I was up to about three thousand permutations when I was done, and yeah, I'd see bugs that only struck if you signed up on the 31st of a month followed by a month of 30 days, for instance. Easy to miss if you're writing on a 15th. (And that code was probably thrown away... :( )


At least analyze boundary conditions and special cases like leap years.

Apparently, we don't learn the Y2K lesson. I keep hearing about date rollover bugs in the industry press. At a company I worked for, there was a 45 day internal clock rollover bug in one version of our virtual machine. AFAIK, no one ever ran into this, except for one system that controlled automated people-mover trains.


Maybe, but there are good-enough test plans. This one wasn't.


Running single most common use case for every day for next 10 years is a simple, doable test.


gosh can't believe this went through. You don't need a test case to catch something like "> 366", a code review or just a conscious glance would do

The test plan as noted cannot be complete but this is a very apparent case. if the code was written to accommodate for the leap year, how could there be no testing for one.


i'd also say the developer who did this will be in the 17% of MS staff layed off next month.


It sounds like the code was written by Freescale, not Microsoft. You could argue that someone at MS should have tested it.


This sounds like a daily wtf posting in the making.


The correct way to do this is to define a quadyear=365*4+1, and then just use mods & divides (quadyear -> year-within-quad -> day) with no loop, right?


Not exactly its a bit more difficult, leap years occur once every four years unless the year is divisible by 100 and not by 400

so 1900 wasn't a leap year 2000 was one 2100 won't be.

Although in most places I saw the code just ignored it as something that's valid between 1901-2099 is considered good enough in most places.

Edit: sorry shouldn't post this early in the morning, its 400 not 1000


No, century years are leap years if they are divisible by 400, so 2400 will be the next such, not 3000!


Sorry, you're right. Changed the original comment.


Ah, so we can maybe expect a Y2.1K problem then


Yep, but we'll have Y2K38 bug before that because of the original Unix time rep as signed 32bit int

http://en.wikipedia.org/wiki/Year_2038_problem


Why do they have access to the source code? Since when is Zune open source?


on zune.net it says this bug will work itself out noon tomorrow... still a bad taste.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: