AFAIK it was a signature update, and the existing code couldn't understand the new signatures with a 2022 timestamp. IMHO, signatures still should have been deployed to an early ring before broad release.
These kinds of things are always tricky, and always easier to see in hindsight. Everything is a trade-off.
I can easily picture a scenario where there is pressure to reduce time to release signature updates, and someone looked at a process where there was early testing (that would have caught this) and said "Over the past 5 years, we've deployed hundreds of signatures with exactly zero failures; why are we still doing this process that adds a pointless delay?"
It originated in the British Royal Air Force where an imperfectly-executed loop-de-loop would go 'pear shaped' instead of round as intended. Interestingly it then played off the British tendency for understatement so that something that meant 'a bit wonky' slid into meaning 'off the rails busted'.
Summary of the bug: Exchange tried to store "2201010001" (i.e. date time '22-01-01 00:00) in a 32-bit signed int, but INT_MAX is 2147483647 so the result is a negative number.
Why is anyone using two-digit years anywhere at all? This approach has already caused a lot of trouble 22 years ago. Do people learn nothing from their past mistakes?
> Do people learn nothing from their past mistakes?
I kinda feel you've never worked with a large code base or database that has a legacy going back 25+ years...older than many of the commentators on this thread.
Not all of the data types used might have been obvious in their "danger", especially in ancient code. Sure you could run some analysis over the code, but can that analyser determine from a #def or variable name the true intent of the value being stored?
I wrote an application for a customer in 1986, and stone me, they were still running it in a DOSBOX emulator in 2004. Lucky for me dates weren't important.
My employer uses the year 2041 for infinity. It will be a truly massive disaster when that year arrives. There is tens of millions of dollars of advertising spend that is paused until “infinity”.
It seems likely that this is an issue with the code specifically, rather than the encoding of the files, based on the fact that they managed to roll out a fix pretty quickly.
I am thinking this could be like DNS serial numbers. Possibly the code was written with a integer serial number (ok using signed is wrong) but then someone else thinks to make things easier for humans we will use this date format for the serial.
Its quite possible that was never in the spec given to the programmer.
DNS serial numbers are 32 bit unsigned and the convention is to use YYYYMMDDNN
So this _convention_ breaks in year 4295, or if someone forgot to put unsigned 2148.
It is only a convention though DNS only cares that the serial increases.
Because with four-digit years this kind of date would never have fit into a signed 32-bit integer. Incidentally, this shows that the timestamp was taken into use after Y2K, since none of the pre-millennium years would have fit.
There's a difference between timestamps and dates. For instance Christmas Day is on 2022-12-25, but it doesn't have any particular time stamp as that depends on which time zone you are in.
If you are making a calendar app, you want to store dates and not timestamps as, e.g. if I have a flight to another country then have an appointment at 9am, that 9am should be in the time zone of wherever I am physically present, which the calendar doesn't necessarily know.
Storing as an integer is compact and allows for easy comparison. It's a clever hack, but they should have used 64 bit, or at least a u32!
Calendar appointments are a good example where a timestamp without a timezone (java's Instant) is appropriate. But a birthday doesn't have a time, so you would want to use LocalDate (a date without a time and timezone) for that.
In 99% of the cases you only really want to involve timezones when presenting something to the user or taking input from them.
So you would create your calendar event by entering "9am, timezone X", and the app would most probably just apply that offset and store that as a timestamp. But if it's an all-day event... I should probably take a look at android's calendar content provider.
Not storing absolute times is a disaster. If it's seemingly working now it's only because your use case apparently doesn't include anything cross-timezone.
Referring to something like "Christmas day" as timezoneless doesn't help either. Anyone who celebrates it celebrates it in a timezone. Christmas Day EST is a different event than Christmas Day CEST. Trying to represent them as the same event will only lead to heartbreak.
I think there is a simple reason for not using unsigned integers. If you use an unsigned integer and get an overflow it's not immediately obvious or even possible to see from the value alone. If you use a signed integer, any negative number means there was a mistake somewhere. That is easier to check for. But that said, they should store it as something like Unix time, even for a calendar. Otherwise sending appointments between timezones need conversions which is notoriously hard to get right. Ideally it'll be stored as local timestamp with timezone offset at the end. That way all information is stored, and in parallel it can have a unix time for reasons of efficiency.
Storing dates in YYMMDDHHMM format but then saving that as an int seems like such an ... odd choice. Curious to how/why that decision was made, or if it was perhaps unintentional but nobody noticed before (e.g. comparison was done on those date values but nobody realized that they were for some reason coerced to ints, and nobody noticed because it still worked as expected).
It's kind of natural for what it is: It started off as a version number. Then someone realized that having semantic information of when that version was issued was useful, so they started using YYMMDDNNN version numbers instead. This worked fine, but then you got to YY being large enough to cause the number to not fit into a signed int32. Oops.
I imagine the 2022 issue was even mentioned in the original ticket by the dev. Though the dev is probably long gone and the ticket long forgotten. I know I’ve mentioned “this approach will break if the following assumptions change” on tickets.
I can think of lots of reasons why it was made: sorting ints is fast, storing them is compact, etc. Doesn't mean it wasn't a terrible decision (with readily foreseeable consequences) but I don't find it irrational or even surprising.
> I can think of lots of reasons why it was made: sorting ints is fast, storing them is compact, etc.
Of course, but in that case you would just store it as a normal date/timestamp value, e.g. ms or sec or heck even days since the epoch, truncated to days. Formatting it as a format string and then converting to an int is the part that's so weird.
You need to store two separate values for a zoneless datetime. Days since epoch and minutes/seconds since midnight. You can't do it with one integer without mixing it somehow.
I’d call this an irrational optimization in modern times for this particular problem space. I have no idea how long this scheme has been in use, but they’re basically ids for malware data releases, so even if you accumulated all the timestamps that have ever been used as these ids, it’s probably a few thousand, max, and growing at a predictable and pretty linear rate. Using clever encodings to save space or sort time in the context of an exchange server for this amount of data probably hasn’t been an appropriate choice in at least a couple decades.
> Using clever encodings to save space or sort time in the context of an exchange server for this amount of data probably hasn’t been an appropriate choice in at least a couple decades.
What makes this weird is that, as hn_throwaway_99 notes, this isn't even a clever encoding. It's a stupid encoding that fails at the tasks of saving space or time. Both of those are accomplished better by the ordinary timestamp value that everything uses. Converting the timestamp to a string and then interpreting that string as an integer brings you:
- A less dense encoding: any value of a timestamp correctly represents a time, but not every value of an integer represents a date format string. You just introduced invalid dates into an encoding that had no reason to contain them.
- Equal sorting times: sorting integers is more or less the same no matter what the integers are.
- Vastly increased processing times: in the naive approach, you're handed a timestamp, which is an integer, and you use it directly. In this "optimized" approach, you're handed a timestamp, you convert it to a string, and then you convert the string back to another integer. This new integer is encoded so that you can't even mask out the bits you want, because the information you care about is stored in powers of 10 while the integer is represented in powers of 2. What did you gain?
This is not a date string. It's a version number. The version number happens to have a date embedded inside it. That's not for the computers, but for the humans looking at it to be able to get some useful information at a glance. The app was treating it simply as a version number and caring about "which is higher?" and nothing else.
I agree. This type of bug seems like it should have been thought of when it was first written, or at least caught during a code review.
However, it's very possible that this was written a decade ago, and even if someone did the example conversion at the time to 1,201,011,234 it wouldn't trigger that "magic number spidey-sense" like a number that's closer to 2.1 billion normally would.
In fact for all we know there's a comment next to it:
This is me. This sounds like the worst designed algorithm ever. When I read it the other night I uncontrollably moved to shout “WTF?!?!” I literally can’t imagine the circumstances that would lead to that solution. When is the only option for saving a date string to encode it as a 32 bit int? To quote The Princess Bride “Inconceivable!”
The current patch actually just changed the number of days allowed in December, to give the team time to patch it properly. The date reported is December 33rd, 2021.
> The version of the updated scan engine starts with 2112330001; is this right? Should we be concerned that it seems to reference a date that does not exist?
> The newly updated scanning engine is fully supported by Microsoft. While we need to work on this sequence longer term, the scanning engine version was not rolled back, rather it was rolled forward into this new sequence. The scanning engine will continue to receive updates in this new sequence.[0]
This is really sloppy code. This should have been caught in a code review very easily.
Maybe this code is dozens of years old and no one thought it would be in production but there should have been a test case for this at the very least.
Meanwhile I was denied a job at Microsoft out of college almost 30 years ago because I didn’t remember how to divide coins properly to find the fake coin.
I wouldn't call one bug, in one software a "vengeance". Yea, it's widly used, and affected lots of mail but was very limited and had a trivial work-around.
But, the Y2K issue, back then, was in 1000s of software from 1000s of vendors. The only Y2K affect I remember was the first issue of 2600 in 2000.
It also wasn’t just the tick over to the year 2000. It was various dates and times around the year 2000 (leap years et al). But the Jan 01 2000 problem was the biggest and most high profile.
Just imagine if the computing revolution started closer to the beginning of the century instead of the last few decades of it. It would have been cray trying to fix everything.
I love the solution: They changed their antimalware definition files to December 33rd, 2021, until they have a more substantial patch ready. The instructions are just to basically clear out the existing files and re-download to get rid of that pesky 2022 year.
It's pretty likely they'll change the data type somehow, but I guess the question is how fast they can test how that interacts with everywhere that might encounter that variable. Presumably the hack they implemented here buys them at least a month or two to test a real code change.
Submitters: please don't editorialize titles. This is in the site guidelines: https://news.ycombinator.com/newsguidelines.html. We've reverted the title now. (Submitted title was "Y2K problem came back with vengeance in 2022".)
I don't think it's particularly widespread, and if anyone can do it it makes sense for mods to have that power. However I have on numerous times (in absolute terms) seen mods change titles to something more informative than the original. I wouldn't necessarily call that "editorializing," though.
"dang" is not a single person a cybernetic entity - hundreds of human beings being worked to death in an underground facility against their will, all mediated by the most sophisticated AI known to man.
It's surprising how often this issue (ceiling for 32-bit ints) comes up way.
If you're still using integers (for ids, timestamps, etc) then just go with 64-bits. It avoids any potential problem with data size in the future, and if the size stays small then it won't matter anyway. Storage is cheap and CPU cache lines are 64-bit now too.
Ah you're right, I was thinking about 64-bit alignment for atomic operations. Either way none of this matters except under the most extreme performance conditions.
Can you imagine being a dev on this bugfix? The bug is known, but every second wasted testing, regressing, and preparing to deploy, literally millions of emails aren't delivered. That's some serious pressure.
I wonder if there were detectable drops in internet traffic due to fewer emails?
The malware scanning service of MS Exchange crashes, because it treats a yyMMddHHmm timestamp as a signed integer when verifying a signature file.
Turns out that 2201010001 is negative when treated as a 32 bit integer (the greatest positive one is 2147483647, and 2021 had fewer than 47 months).
I can only assume that somebody wrote that "timestamp string as integer" code, checked that it worked correctly (at the time) and then just assumed they must be good on data type range.
I think it probably developed the other way around, same as the version information almost every DNS server uses.
The anti malware releases probably started out with definition version 1, 2, 3, and so on, until someone complained that version 37363 doesn't mean much. Some smart guy probably realised that as long as the numbers increased, they could put anything in there, so the 1201010000 version was born, incremented a day at a time.
As MXToolbox states:
> The serial number is an unsigned 32 bit value assigned to your SOA record must be between 1 and 4294967295. […] It has become common to set your serial number with a date format to make it easier to to manage. This format uses 10 digits to represent the date and then a two digit sequence number with the format of YYYYmmddss.
Do any SOA query against hosted DNS and you'll probably see exactly this pattern appear.
The fact they used a long go store the version also suggested that someone may not have realised that longs are the same size as its, depending on your platform and compiler. The effective distinction between the two in Microsoft's compiler ended in the 32 bit era, so using a long to store anything might be seen as a red flag.
Based on the fix linked, the current antimalware release seems to be based on December 33rd, 2021. That will buy them enough time to write a patch that just uses long longs for these timestamps, or if their customers don't like installing updates they might just go back to incrementing the number by one at some point.
Really? Billions of lines of code is reviewed every year at Microsoft. Not hard for me to imagine that this was coerced to an int somewhere (perhaps even unintentionally) that was non-obvious during a code review.
We have some test suites that run with a date two weeks in the future. The last two weeks of every year always seems to be fixing these kinds of bugs! (Luckily it’s a web app and most people refresh the page after a few days)
the intern who wrote the code didn't think of the future farther than the weekend party and the senior who reviewed it 10 years ago didn't catch it and doesn't work there anymore.
Compare the two snippets in [1]:
> Description: The FIP-FS "Microsoft" Scan Engine failed to load. PID: 23092, Error Code: 0x80004005. Error Description: Can't convert "2201010001" to long.
> Description: The FIP-FS Scan Process failed initialization. Error: 0x80004005. Error Details: Unspecified error.
I hope it's trend that continues because when it's going pear shaped every little morsel of information is important to narrow down the problem.
I'm less pleased they seemly didn't deploy this update to an internal on-prem test Exchange server before a wider release.
[1]: https://techcommunity.microsoft.com/t5/exchange-team-blog/em...