Hacker News new | past | comments | ask | show | jobs | submit login
Microsoft confirms Exchange Year 2022 problem (borncity.com)
240 points by niuzeta on Jan 2, 2022 | hide | past | favorite | 102 comments



I'm pleased the error message actually mentioned exactly what's wrong.

Compare the two snippets in [1]:

> Description: The FIP-FS "Microsoft" Scan Engine failed to load. PID: 23092, Error Code: 0x80004005. Error Description: Can't convert "2201010001" to long.

> Description: The FIP-FS Scan Process failed initialization. Error: 0x80004005. Error Details: Unspecified error.

I hope it's trend that continues because when it's going pear shaped every little morsel of information is important to narrow down the problem.

I'm less pleased they seemly didn't deploy this update to an internal on-prem test Exchange server before a wider release.

[1]: https://techcommunity.microsoft.com/t5/exchange-team-blog/em...


> I'm less pleased they seemly didn't deploy this update

Is there actually an update here, or just some code that was probably sitting around for years until 2022 rolled around?


AFAIK it was a signature update, and the existing code couldn't understand the new signatures with a 2022 timestamp. IMHO, signatures still should have been deployed to an early ring before broad release.

Antimalware signatures as code, anyone?


These kinds of things are always tricky, and always easier to see in hindsight. Everything is a trade-off.

I can easily picture a scenario where there is pressure to reduce time to release signature updates, and someone looked at a process where there was early testing (that would have caught this) and said "Over the past 5 years, we've deployed hundreds of signatures with exactly zero failures; why are we still doing this process that adds a pointless delay?"


> when it's going pear shaped

I think this means “when it’s going bad”, but what does “pear shaped” mean? And why a pear?


It originated in the British Royal Air Force where an imperfectly-executed loop-de-loop would go 'pear shaped' instead of round as intended. Interestingly it then played off the British tendency for understatement so that something that meant 'a bit wonky' slid into meaning 'off the rails busted'.


Summary of the bug: Exchange tried to store "2201010001" (i.e. date time '22-01-01 00:00) in a 32-bit signed int, but INT_MAX is 2147483647 so the result is a negative number.


Damn. This reminds me of the alleged 'Nuclear Gandhi' [1].

Or the Swiss railway axle counter [2].

[1]: https://en.wikipedia.org/wiki/Nuclear_Gandhi

[2]: https://www.reddit.com/r/programming/comments/4sco75/the_axl...


That reddit post is a link to a Twitter post containing a link to an imgur image containing the relevant text


The future is weird.


Why is anyone using two-digit years anywhere at all? This approach has already caused a lot of trouble 22 years ago. Do people learn nothing from their past mistakes?


> Do people learn nothing from their past mistakes?

I kinda feel you've never worked with a large code base or database that has a legacy going back 25+ years...older than many of the commentators on this thread.

Not all of the data types used might have been obvious in their "danger", especially in ancient code. Sure you could run some analysis over the code, but can that analyser determine from a #def or variable name the true intent of the value being stored?

I wrote an application for a customer in 1986, and stone me, they were still running it in a DOSBOX emulator in 2004. Lucky for me dates weren't important.


Because these files aren't going backward in time, and it's quite unlikely this code will be in use in 78 years.


Well clearly they didn’t even seem to expect this code to be in use in 2022!


> it's quite unlikely this code will be in use in 78 years

Famous last words.


I know of a system that uses "1/1/2099" everywhere an "infinity" date is needed.

The problem with fixing it, is nobody thinks it will be their problem.


My employer uses the year 2041 for infinity. It will be a truly massive disaster when that year arrives. There is tens of millions of dollars of advertising spend that is paused until “infinity”.


Whenever advertising stops working, the world becomes a better place.


As far as I understood, according to soared's comment, a lot of advertising campaigns will _start again_ in 2041.


Realistically these particular people are right though.


Unless you're importing historical records...


It seems likely that this is an issue with the code specifically, rather than the encoding of the files, based on the fact that they managed to roll out a fix pretty quickly.


I agree, but parent was responding to a more general comment of "Why is anyone using two-digit years anywhere at all?".


I am thinking this could be like DNS serial numbers. Possibly the code was written with a integer serial number (ok using signed is wrong) but then someone else thinks to make things easier for humans we will use this date format for the serial. Its quite possible that was never in the spec given to the programmer.

DNS serial numbers are 32 bit unsigned and the convention is to use YYYYMMDDNN

So this _convention_ breaks in year 4295, or if someone forgot to put unsigned 2148. It is only a convention though DNS only cares that the serial increases.


> Why is anyone using two-digit years

Because with four-digit years this kind of date would never have fit into a signed 32-bit integer. Incidentally, this shows that the timestamp was taken into use after Y2K, since none of the pre-millennium years would have fit.


Double edged sword.. if they were to use 4 digit numbers, they couldn't use int32 to store YYYYMMDDHHMM.

But.. they wouldn't be able to use int32 and therefore wouldn't have had this bug. :)


Why not using unix timestamp though? Is it cause they're MS?


There's a difference between timestamps and dates. For instance Christmas Day is on 2022-12-25, but it doesn't have any particular time stamp as that depends on which time zone you are in.

If you are making a calendar app, you want to store dates and not timestamps as, e.g. if I have a flight to another country then have an appointment at 9am, that 9am should be in the time zone of wherever I am physically present, which the calendar doesn't necessarily know.

Storing as an integer is compact and allows for easy comparison. It's a clever hack, but they should have used 64 bit, or at least a u32!


Calendar appointments are a good example where a timestamp without a timezone (java's Instant) is appropriate. But a birthday doesn't have a time, so you would want to use LocalDate (a date without a time and timezone) for that.

In 99% of the cases you only really want to involve timezones when presenting something to the user or taking input from them.

So you would create your calendar event by entering "9am, timezone X", and the app would most probably just apply that offset and store that as a timestamp. But if it's an all-day event... I should probably take a look at android's calendar content provider.


Not storing absolute times is a disaster. If it's seemingly working now it's only because your use case apparently doesn't include anything cross-timezone.

Referring to something like "Christmas day" as timezoneless doesn't help either. Anyone who celebrates it celebrates it in a timezone. Christmas Day EST is a different event than Christmas Day CEST. Trying to represent them as the same event will only lead to heartbreak.


Birthdays are an exception to this rule.

Generally I think Dates (without time) are usually considered time-less and therefore timezone-less.

In fact the examples you mention are only a concern when considering the start or end of Christmas Day, and not the actual date of Christmas Day.


The problem I see is that at some point you will try to decide whether it's currently Christmas or not. That's when things start falling apart.


I think there is a simple reason for not using unsigned integers. If you use an unsigned integer and get an overflow it's not immediately obvious or even possible to see from the value alone. If you use a signed integer, any negative number means there was a mistake somewhere. That is easier to check for. But that said, they should store it as something like Unix time, even for a calendar. Otherwise sending appointments between timezones need conversions which is notoriously hard to get right. Ideally it'll be stored as local timestamp with timezone offset at the end. That way all information is stored, and in parallel it can have a unix time for reasons of efficiency.


or put the month first (but of course that ruins easy sortability)


Unix timestamp is hardly better, due to the Year 2038 problem. I think the only truly robust solution is to use a 64-bit integer instead.


time_t solved the 2038 problem a long time ago



if you recompile the world, it did...


Storing dates in YYMMDDHHMM format but then saving that as an int seems like such an ... odd choice. Curious to how/why that decision was made, or if it was perhaps unintentional but nobody noticed before (e.g. comparison was done on those date values but nobody realized that they were for some reason coerced to ints, and nobody noticed because it still worked as expected).


It's kind of natural for what it is: It started off as a version number. Then someone realized that having semantic information of when that version was issued was useful, so they started using YYMMDDNNN version numbers instead. This worked fine, but then you got to YY being large enough to cause the number to not fit into a signed int32. Oops.


I imagine the 2022 issue was even mentioned in the original ticket by the dev. Though the dev is probably long gone and the ticket long forgotten. I know I’ve mentioned “this approach will break if the following assumptions change” on tickets.


I can think of lots of reasons why it was made: sorting ints is fast, storing them is compact, etc. Doesn't mean it wasn't a terrible decision (with readily foreseeable consequences) but I don't find it irrational or even surprising.


> I can think of lots of reasons why it was made: sorting ints is fast, storing them is compact, etc.

Of course, but in that case you would just store it as a normal date/timestamp value, e.g. ms or sec or heck even days since the epoch, truncated to days. Formatting it as a format string and then converting to an int is the part that's so weird.


You need to store two separate values for a zoneless datetime. Days since epoch and minutes/seconds since midnight. You can't do it with one integer without mixing it somehow.


I’d call this an irrational optimization in modern times for this particular problem space. I have no idea how long this scheme has been in use, but they’re basically ids for malware data releases, so even if you accumulated all the timestamps that have ever been used as these ids, it’s probably a few thousand, max, and growing at a predictable and pretty linear rate. Using clever encodings to save space or sort time in the context of an exchange server for this amount of data probably hasn’t been an appropriate choice in at least a couple decades.


> Using clever encodings to save space or sort time in the context of an exchange server for this amount of data probably hasn’t been an appropriate choice in at least a couple decades.

What makes this weird is that, as hn_throwaway_99 notes, this isn't even a clever encoding. It's a stupid encoding that fails at the tasks of saving space or time. Both of those are accomplished better by the ordinary timestamp value that everything uses. Converting the timestamp to a string and then interpreting that string as an integer brings you:

- A less dense encoding: any value of a timestamp correctly represents a time, but not every value of an integer represents a date format string. You just introduced invalid dates into an encoding that had no reason to contain them.

- Equal sorting times: sorting integers is more or less the same no matter what the integers are.

- Vastly increased processing times: in the naive approach, you're handed a timestamp, which is an integer, and you use it directly. In this "optimized" approach, you're handed a timestamp, you convert it to a string, and then you convert the string back to another integer. This new integer is encoded so that you can't even mask out the bits you want, because the information you care about is stored in powers of 10 while the integer is represented in powers of 2. What did you gain?


This is not a date string. It's a version number. The version number happens to have a date embedded inside it. That's not for the computers, but for the humans looking at it to be able to get some useful information at a glance. The app was treating it simply as a version number and caring about "which is higher?" and nothing else.


Totally agreed. I should have put “clever” in irony quotes.


I agree. This type of bug seems like it should have been thought of when it was first written, or at least caught during a code review.

However, it's very possible that this was written a decade ago, and even if someone did the example conversion at the time to 1,201,011,234 it wouldn't trigger that "magic number spidey-sense" like a number that's closer to 2.1 billion normally would.

In fact for all we know there's a comment next to it:

    // TODO: convert to Int64 sometime before 2022


This is me. This sounds like the worst designed algorithm ever. When I read it the other night I uncontrollably moved to shout “WTF?!?!” I literally can’t imagine the circumstances that would lead to that solution. When is the only option for saving a date string to encode it as a 32 bit int? To quote The Princess Bride “Inconceivable!”


Oh they should have used an UNSIGNED number. /sarcasm


I'm sure that was the patch.


The current patch actually just changed the number of days allowed in December, to give the team time to patch it properly. The date reported is December 33rd, 2021.


Source?

Another tale to tell your grandkids when they ask about those weird dates...?


> The version of the updated scan engine starts with 2112330001; is this right? Should we be concerned that it seems to reference a date that does not exist?

> The newly updated scanning engine is fully supported by Microsoft. While we need to work on this sequence longer term, the scanning engine version was not rolled back, rather it was rolled forward into this new sequence. The scanning engine will continue to receive updates in this new sequence.[0]

[0] https://techcommunity.microsoft.com/t5/exchange-team-blog/em...


> […] so the result is a negative number.

No, the result is a parser failure. That’s what the error message[1] says, at least.

[1] https://news.ycombinator.com/item?id=29775547


Or std::numeric_limits<int>::max()


This is really sloppy code. This should have been caught in a code review very easily.

Maybe this code is dozens of years old and no one thought it would be in production but there should have been a test case for this at the very least.

Meanwhile I was denied a job at Microsoft out of college almost 30 years ago because I didn’t remember how to divide coins properly to find the fake coin.


I wouldn't call one bug, in one software a "vengeance". Yea, it's widly used, and affected lots of mail but was very limited and had a trivial work-around.

But, the Y2K issue, back then, was in 1000s of software from 1000s of vendors. The only Y2K affect I remember was the first issue of 2600 in 2000.


It also wasn’t just the tick over to the year 2000. It was various dates and times around the year 2000 (leap years et al). But the Jan 01 2000 problem was the biggest and most high profile.


I remember the laser-tag place near Google (now long gone) would print out your scorecard with a date like 11/24/102


Yeah I saw that sort of thing frequently on smaller websites back then as well.


IIRC even phpBB would start reporting years as 19100


Just imagine if the computing revolution started closer to the beginning of the century instead of the last few decades of it. It would have been cray trying to fix everything.


I love the solution: They changed their antimalware definition files to December 33rd, 2021, until they have a more substantial patch ready. The instructions are just to basically clear out the existing files and re-download to get rid of that pesky 2022 year.


Perhaps switch to unsigned int until a sane solution is implemented. Should give them a couple thousand years to work with.


No, it just kicks the can down the road another 22 years.


At first glance thought they were using yyyy format, it's actually only yy. You are correct.


It's pretty likely they'll change the data type somehow, but I guess the question is how fast they can test how that interacts with everywhere that might encounter that variable. Presumably the hack they implemented here buys them at least a month or two to test a real code change.


This bug was extensively discussed yesterday:

Microsoft Exchange stops passing mail due to bug on 1/1/22 (677 points / 355 comments)

https://news.ycombinator.com/item?id=29756714


Submitters: please don't editorialize titles. This is in the site guidelines: https://news.ycombinator.com/newsguidelines.html. We've reverted the title now. (Submitted title was "Y2K problem came back with vengeance in 2022".)


My apologies. Will heed this next time. Thanks for reverting the title.


Appreciated!


[flagged]


Do the mods do that?


I don't think it's particularly widespread, and if anyone can do it it makes sense for mods to have that power. However I have on numerous times (in absolute terms) seen mods change titles to something more informative than the original. I wouldn't necessarily call that "editorializing," though.


There are mods, plural? I.e. someone else besides dang?


"dang" is not a single person a cybernetic entity - hundreds of human beings being worked to death in an underground facility against their will, all mediated by the most sophisticated AI known to man.


According to this there are 2: dang and sctb.

https://news.ycombinator.com/item?id=14283202


Didn’t sctb step down?


sctb hasn't commented since 2019, so it looks like it


It's surprising how often this issue (ceiling for 32-bit ints) comes up way.

If you're still using integers (for ids, timestamps, etc) then just go with 64-bits. It avoids any potential problem with data size in the future, and if the size stays small then it won't matter anyway. Storage is cheap and CPU cache lines are 64-bit now too.


Cache lines are 64 bytes not bits. That is: a cache line can store 8 64-bit numbers.


Ah you're right, I was thinking about 64-bit alignment for atomic operations. Either way none of this matters except under the most extreme performance conditions.


This will always be a challenge for applications that start their lives as 32-bit (Exchange wasn't 64-bit until 2007)


Thought this had to do with memory addressing/pointer size, not data types.


Can you imagine being a dev on this bugfix? The bug is known, but every second wasted testing, regressing, and preparing to deploy, literally millions of emails aren't delivered. That's some serious pressure.

I wonder if there were detectable drops in internet traffic due to fewer emails?


Here we have a person not running Exchange, seeing the impact on their own outbound delayed queues:

https://twitter.com/taupehat/status/1477367798326120458


Wow, what a bug.

The malware scanning service of MS Exchange crashes, because it treats a yyMMddHHmm timestamp as a signed integer when verifying a signature file.

Turns out that 2201010001 is negative when treated as a 32 bit integer (the greatest positive one is 2147483647, and 2021 had fewer than 47 months).

I can only assume that somebody wrote that "timestamp string as integer" code, checked that it worked correctly (at the time) and then just assumed they must be good on data type range.


I think it probably developed the other way around, same as the version information almost every DNS server uses.

The anti malware releases probably started out with definition version 1, 2, 3, and so on, until someone complained that version 37363 doesn't mean much. Some smart guy probably realised that as long as the numbers increased, they could put anything in there, so the 1201010000 version was born, incremented a day at a time.

As MXToolbox states:

> The serial number is an unsigned 32 bit value assigned to your SOA record must be between 1 and 4294967295. […] It has become common to set your serial number with a date format to make it easier to to manage. This format uses 10 digits to represent the date and then a two digit sequence number with the format of YYYYmmddss.

Do any SOA query against hosted DNS and you'll probably see exactly this pattern appear.

The fact they used a long go store the version also suggested that someone may not have realised that longs are the same size as its, depending on your platform and compiler. The effective distinction between the two in Microsoft's compiler ended in the 32 bit era, so using a long to store anything might be seen as a red flag.

Based on the fix linked, the current antimalware release seems to be based on December 33rd, 2021. That will buy them enough time to write a patch that just uses long longs for these timestamps, or if their customers don't like installing updates they might just go back to incrementing the number by one at some point.


Surprised this sort of hack would even pass code review.


Really? Billions of lines of code is reviewed every year at Microsoft. Not hard for me to imagine that this was coerced to an int somewhere (perhaps even unintentionally) that was non-obvious during a code review.


I'm rather surprised the test suite for a product like Exchange wouldn't include setting date types to ($TODAY..$FAR_FUTURE).


We have some test suites that run with a date two weeks in the future. The last two weeks of every year always seems to be fixing these kinds of bugs! (Luckily it’s a web app and most people refresh the page after a few days)


Site has this confusing update:

> Addendum: A fix ist available.

Are we speaking German now? Or is that a typo for isn’t? Or is, maybe?


Long ago I learned a lesson: Don't write your own date time handling code.

(because you will always miss something)



Any reason they didn't make the int unsigned from the get go? Or would that also cause issues?


the intern who wrote the code didn't think of the future farther than the weekend party and the senior who reviewed it 10 years ago didn't catch it and doesn't work there anymore.


Please, D. Knuth just solve this dates for computers problem once and for all!


So many spelling mistakes!


Originally in German, this is a best-effort translation by the author.


Maybe this will kill the last remenants of people thinking they are using spf when they are actually using senderid




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: