Hacker News new | past | comments | ask | show | jobs | submit login
Compiler Bugs Found When Porting Chromium to VC++ 2015 (randomascii.wordpress.com)
140 points by cpeterso on March 25, 2016 | hide | past | favorite | 64 comments



And I have to say that the Microsoft team was amazing. They were very supportive, and helpful, and it was clear that they really wanted VC++ 2015 to be as good as possible

Rather anecdotical of course, but I had the same impression. File a proper bug report and you get a reply in the next couple of days, typically starting with thanks for the excellent bug report ... will try to get the fix in the next update. Found a couple of bugs in VS2012 and VS2013 and they were resolved in either the next update or in the next version. Practically speaking it usually took somwhere between 1 - 3 months, which isn't too bad all things considered. Like, I've also had much worse experiences with some competitors where I'd file a report including a possible fix (something like 'change <= to <' in soure x line y), and 6 months later there even hasn't ben a reply let alone a fix.. Anyway with VS2015 it even seems like they cranked it up a notch beacuse the bugs I encountered already had a fix in the update I didn't manage to install yet :]


> Like, I've also had much worse experiences with some competitors where I'd file a report including a possible fix (something like 'change <= to <' in soure x line y), and 6 months later there even hasn't ben a reply let alone a fix.

cough Apple cough ...

A few years ago I reported an issue for iOS that _never_ got a response. When I made it to WWDC 6 months after reporting it, I went the support session and ended up running into the guy that wrote the bug. He was aware of the issue, it had been reported a few times, and he gave me a workaround until they released the fix. It was nice to finally get a resolution, but the whole process was seriously fucked up. After the fix was released in the next version of iOS they closed my bug report without a single comment.


Back in the day, you could escalate a bug to the next patch by emailing steve@mac.com claiming you lost work, or can't get something done. I did this twice, and got a patch both times. Once, they literally patched iMovie over a weekend after I emailed Steve.

I'd bet the teams were quite unhappy...


well, if you order a fishmac and you don't get it, you get to the manager..


Sounds like they're using an internal tracker not publically accessible... making the public one more of a pony show than anything.

Sadly, this is common for a lot of places (they're afraid internal discussions would get negative attention if made public in some tracker).


> (they're afraid internal discussions would get negative attention if made public in some tracker)

Mostly, they're afraid internal discussion would leak super-seekrit info about their product pipeline. You know how anal they can be about their product announcements.

I mean, they get worked up about leaks like this: http://arstechnica.com/apple/2016/03/unreleased-12-inch-macb...


You may be underestimating the overwhelming power of spam. That is, you're attributing a fear of negative attention. That might be true, but I suspect the concern has more to do with abuse. Don't underestimate the power of abusive users who have disproportionate power.


> they're afraid internal discussions would get negative attention if made public in some tracker

That's understandable for a company like Apple. It just pissed me off that they had a workaround, possibly for months, and never bothered to respond to my report.


Apple, by comparison, is a black hole. I filed a bug in July 2014 [0] against xcodebuild that we still get bit by every once in a while. Never got a response. The repro is trivial.

[0] http://blog.ryjones.org/2015/01/06/xcode-is-killing-me/


I think whoever holds the platform with the biggest earning potential has the worst bug situation. I remember MS in the last decade, with their "reinstall and don't call us back"-support attitude. Now that iOS is the key to unicorn valuations, MS has had to fix their developer support.


At the bottom of the article Bruce mentions the slashdot comments went downhill. This is the first time I've visited slashdot in years and I'm sad to see an old haunt turned so bad. The equivalent reddit thread is significantly more interesting, positive, and hacker-ish: https://tech.slashdot.org/story/16/03/25/141256/chromium-bei... https://www.reddit.com/r/programming/comments/4bvswj/compile...


I overstayed at Slashdot. I kept visiting it probably due to a long habit of it being the first news site I'd check while code was compiling. Just ingrained in muscle memory.

Then, one day, I realized most discussions had turned vile. Cynical rants that would find fault with absolutely anything, and I would feel dirtier reading it.

It's really a shame, a bit like seeing an old friend that ended up with wrong choices in life.


> Cynical rants that would find fault with absolutely anything.

I think that is just comments on the internet in general, everywhere has them, I had to work hard at stopping it bothering me, you can't bath in negativity and not get wet.


I was really into slashdot for like a year, and then quickly saw the tides turn in favor of trolls for some reason and left.. I mean, it went from being a place I check multiple times per day to now I never go to slashdot unless someone explicitly gives me a link. I think they lost their community somewhere in the multiple buy outs. Their bias supporting Sourceforge during the adware bundling debacle I think was when there was a mass exodus, and now it's pretty much just trolls left.


There have been many mass exoduses from there.

I was into Slashdot back in the 1990s. I lost interest about 15 years ago. When the Sourceforge thing happened I was mildly surprised to see that Slashdot still existed.


Back in the mid-2000's when Slashdot was "working" -- It was this: A bunch of yahoos would post enough inane comments, that an expert who actually worked there would get annoyed enough to write a long, insightful post full of actual facts.


In the late 90s, everyone who was technical and cared about Linux/open source/etc monitored slashdot. EVERYONE.

A lot of the experts who got annoyed, probably had been following it from then. But as time went by, being said expert got tiring. Eventually the trolls drove them away.


Yep. "News for nerds. Stuff that matters." It used to have some useful software news. Then it turned Randian and political, and it was just too tiring to wade through the BS. It apparently still exists, but I see no reason to look at it.


FYI, an email I got the other day

notices@slashdotmedia.com 18 Mar (11 days ago)

to me 2016-03-17

Dear Site User,

Fair processing notice - Data Protection Act 1998

We are writing to let you know that with effect from 27 January 2016, the Slashdot Media business, which provides online services through various web sites including Slashdot.org and SourceForge.net (the "Slashdot Media Services") has been purchased by SourceForge Media LLC of 1660 Logan Avenue, San Diego, California, 92113, USA ("we" or "us").

As a result your personal data have been transferred to us and will be used in connection with the continued provision of the Slashdot Media Services to you. Your personal data will continue to be processed fairly and lawfully in accordance with the Data Protection Act 1998 for the same purposes as those it was originally collected by Dice Career Solutions Inc and/or eFinancialCareers Limited including to:

* continue to provide you with information (by electronic means or otherwise) about other services we offer that are similar to those that you have already received or enquired about; * carry out our obligations arising from any contracts entered into between you and us; * provide you with the information and services you request from us; * tell you about changes to the Slashdot Media Services; and * ensure that the content made available through the Slashdot Media Services is presented in the most effective manner for you and your device.

Further information on how your personal data may be processed, who it may be disclosed to and how it will be stored can be found in the Slashdot Media Services privacy policy available at: http://www.slashdotmedia.com/privacy-statement/

You can ask us to remove all your account data, stop processing your personal data and to stop contacting you for marketing purposes at any time. * For SourceForge.net, please contact us at sfnet_ops@slashdotmedia.com * For Slashdot, please contact us at privacy@slashdot.org * For FreeCode, please contact us at freecode-privacy@slashdotmedia.com * For SlashdotMedia.com, please contact us at sfnet_ops@slashdotmedia.com

Please let us know if you have any queries. Yours sincerely, Logan Abbott The team at SourceForge Media LLC


Yeah, I gave up on Slashdot comments when the comments on the "Microsoft is joining the Eclipse Foundation" news item turned into a "Microsoft is killing Eclipse" conspiracy.


Completely different from HN, where every "Google is improving security/adding a new API/product" post turns into a "Google is an NSA front/will kill any product/API that you use" flamedown.


"Google is an NSA front" and "Google may kill any product/API that you use" are in completely different orbits in terms of rationality.


One thing I like about Slashdot is that the comments there often represent a much wider set of opinions and a greater diversity of thought than are found here or at reddit.

That does often mean that more of the comments might be seen as negative, some might be offensive to some people, and some might contain misinformation. But it works the other way, too, where some of the best commentary I've seen anywhere has been there, often from people posting as Anonymous Coward.


"I generally don't consider my own software adequately enough tested until its tests have turned up a bug in a compiler/toolchain."

https://www.reddit.com/r/Bitcoin/comments/2rrxq7/on_why_010s...


I was really proud to find my JavaScript benchmark website listed as causing a crash in Opera back when they were maintaining their own engines.

No details were given as to what I was doing that crashed them though, and the bug was eventually fixed.


You're not going to discover compiler bugs if you stick to language features and compilers that have been thoroughly trodden, and don't push the limits in any special way (massive functions etc..).


For an example of what Greg Maxwell might consider "adequate testing", see https://www.ietf.org/proceedings/82/slides/codec-4.pdf (he is the one responsible for all of the testing described in that presentation).

Slide 20 lists no less than 8 toolchain bugs found with a codebase of less than 50k lines of C code (no C++), using only thoroughly trodden language features, without "massive functions" or anything else. Admittedly 4 of those bugs were in pre-release versions of gcc, but the rest weren't.


I've heard ICC is pretty buggy. Tinycc seems a bit obscure. pre-release versions of GCC as you mentioned are not well-trodden.


I've heard ICC is pretty buggy.

Perhaps historically ICC was buggy, but I don't think it's true anymore. I often test highly optimized C code with GCC, Clang, and ICC, and anecdotally I'd say that the likelihood of hitting a compiler bug when compiling for a current Intel processor is about the same for each.

For me, crashing bugs and true miscompilation are rare with all three, but come up occasionally. Performance differences are usually within +/- 20% on microbenchmarks, with each having about equal chances of being the fastest or slowest.


My favorite compiler bug was clang occasionally failing to set the al register to a sensible value before making a vararg call. It was especially fun because clang's prologue for vararg functions was robust against bad values, so nothing happened for clang->clang calls. But call into a vararg function compiled by gcc and all hell broke loose when gcc used the value of al as part of a computed jump that assumed it was in the range of 0-8.

The circumstances needed to trigger this bug weren't special, I never nailed it down entirely but it was just something like performing an integer division at just the wrong spot. Vararg function calls are extremely well trodden, and clang at the time was pretty mature. Yet there was a bug, all the same.


Compiler changes in code generation and optimization can affect even well-trodden code. The article mentions that the following line of code would, in some cases, only zero the first four bytes of the array when compiled with VS2015:

  char key[5] = { 0 };
and the "fix":

  // Work around VC++ 2015 Update 1 code-gen bug:
  // https://connect.microsoft.com/VisualStudio/feedback/details/2291638
  key[4] = 0;
https://chromium.googlesource.com/chromium/third_party/ffmpe...


It was a Chromium bug, and it still isn't fixed because Microsoft wrongly accepted it as a compiler bug. The Chromium code contained this gem involving a union type called av_alias32 containing an int:

(((av_alias32*)(key))->u32 = (argc));

The first problem is that an int may have an alignment constraint. Since writing an int to an allocation that is made based on char could violate this constraint, the code is invalid. It doesn't matter that x86 won't actually enforce this. The code is still invalid. Since that line of code is invalid, all code paths that will reach it are also invalid, as are all code paths from it. This makes the entire test program invalid; it is legit for the compiler to emit a program that immediately crashes or insults your mom.

The second problem is when the char array is checked to see if it contains the 42. An access as char is always permitted, so the code narrowly misses having an aliasing violation, but remember that the type AS ACTUALLY WRITTEN is not char. The only thing that can be legitimately done with the value is to write it elsewhere, for example when implementing a function like memcpy. An int need not be stored in little-endian form, big-endian form, two's complement, or anything else sane... and the language standard allows for this by prohibiting what this code is doing. As with the other bug: all code paths which involve the standards violation are invalid, making the whole test program invalid. The compiler is permitted to emit code that crashes or insults your mom.


I'm sure you are correct about the aliasing being undefined behavior. However, Microsoft's attitude towards certain types of undefined is quite different from the gcc/clang attitude and VC++ (more or less) promise not to break your program if you break the aliasing rules.

gcc/clang are within their rights to break programs that break the aliasing rules, but VC++ is not required to do so and they choose not to.

There have been some excellent articles talking about how the current state of undefined behavior as understood by gcc/clang is crazy. One that I like is this one: http://blog.regehr.org/archives/761

Obviously some types of undefined behavior cannot be 'normalized', such as use-after-free, but treating the aliasing in that bug as undefined behavior is a committee decision that many developers think is ill advised.

Another recent article that I can't find pokes some holes in the idea that the latitude given to compilers by undefined behavior improves performance.

For your second example of undefined behavior I'm afraid I don't understand. Are you saying that even if the line of code that aliases the array is removed that "if (key[0] != 42)" is still undefined behavior? If so please explain. You say "the type AS ACTUALLY WRITTEN is not char" but I'm not sure what type you are referring to, and if you mean '42' that is true but not relevant because in a comparison between key[0] and '42' "key[0]" is converted to int by the standard integral conversions and then a particularly boring comparison is done.

If that code is undefined then the standards committee is definitely crazy and no code is conforming.


I should have written "AS ACTUALLY STORED" to be clear. You store an int into memory. That memory happens to be described as a char array, but you stored an int as an int. It's an int. It will always be an int, and you can only legitimately interpret it as an int. (Had you stored to key[0] in the normal way then the value would have been be stored as a char instead.)

Reading it out as a long or float would be prohibited by aliasing rules, even if those types happen to be the same size. When you do "if(key[0] != 42)" you are reading out the value in key[0] as char, which is OK, but the content is undefined. You can't legitimately compare an undefined value with 42.

There is a special exception for char. You don't violate the aliasing rules when you load part of the int as a char. This special exception is very limited. It's for implementing functions like memcpy. (but not of that name, since "memcpy" is reserved)

The special exception doesn't make the data meaningful when loaded as char. The bits could be in any order. When you load part of an int into a char, the only legitimate thing you can do with it is to store it as if doing a memcpy.

Depending on how you read the C standard, you might even need an unsigned char to avoid trap representations. People argue this both ways.

That all said, not even gcc is this cruel.


Generally, if you run into a bug while programming, it is probably a mistake you, yourself, made. This is not always true, compiler bugs do exist, and can be found: http://blog.regehr.org/archives/category/compilers

Consider that entire industries (aerospace, etc) routinely ship code with compiler optimization disabled, to avoid getting bit by compilers trying to do clever things.


I would generally agree, unless you are coding in OpenCL :)


You're less likely to discover compiler bugs, but you can still "beat the odds" if your codebase is big enough. Many of the bugs mentioned in the post were triggered by code not doing anything especially unusual.


That's simply not true. There are some bugs in GCC with flow-sensitive constant evaluation that cause statically-impossible "undefined behavior" to be seen and aggressively optimized against, causing a miscompilation. I've encountered clang and gcc internal errors on rather kosher modestly sized C++ projects as well.


The GCC devs do seem to be pushing hard on undefined behaviour optimisation, sometimes a little too hard, resulting in the introduction of bugs.

I have come across a lot of transient miscompilation in visual studio, I guess, usually when doing incremental rebuilds. I was thinking more of persistent miscompilations as things you are unlikely to find on the 'well trodden path'


I found Visual Studio a lot more reliable once I took to disabling Minimal Rebuild. (Don't ask me why I thought to try this... it's possible I read about it somewhere.) People always seemed to leave this switched on, because it sounds like you want it - but you don't. It supplants the standard file time-based dependency checking with some additional content-based stuff (see https://msdn.microsoft.com/en-us/library/kfz8ad09.aspx) that goes wrong every now and again.


I always have /GM off because it interferes with multi-threaded building. However I still get miscompilations / bad builds occasionally, which are fixed with a full rebuild.


Compilers are a moving target - and you'd be surprised what you can find. Here's one I recently ran into (with a different repro:)

https://connect.microsoft.com/VisualStudio/feedback/details/...

I didn't file a bug as it'd already been reported - although in my case, the trigger was using a logging macro which used __LINE__ from a lambda IIRC. Nothing terribly spectacular.

The last bug I remember reporting was default constructor parameters not instantiating templates correctly. Something like void f(int i = some_template<int>::func()); would fail to generate, and thus link against, some_template<int>::func. f might've needed to be a constructor. Sadly the connect bug is 404 and web.archive.org didn't capture the page. They said they would fix it in VS2008 SP2, but I ended up checking and finding they'd fixed it in SP1.

Even ignoring ICEs, I'd say I probably hit at least one confirmed compiler bug a year minimum when doing regular C++ development.


My team has found several bugs in gcc-go (we maintain Docker for architectures that don't have a port of the gc compiler). Now, you might argue that Docker is full of weird stuff (you'd be correct), but the bugs we found were not related to any of the really weird stuff in Docker (it was mainly with cgo IIRC).


The other side of that coin is that one of my high-assurance system recommendations was specifically to use the most common constructs and libraries in most common configurations. Albeit with safety or interface checks added where necessary. Even without provable correctness, it dramatically reduces occurrence of all kinds of issues.


I had a massive memory leak in normal usage of std::stringstream in Visual C++ 2008, which eventually got fixed in an update. Wasted a lot of my time tracking down that leak, though.


stringstream is maybe one of those terrible APIs that bugs remain in because barely anyone uses them :)


I don't think I'd consider std::stringstream to be particularly obscure.


Particularly since it's the easiest way to implement a generic to_string function (prior to C++11).


I use stringstream all the time to prepare complicated output for a logger.


I independently found an internal compiler error in GCC 4.5.2 while using inline assembly. By the time i managed to create a small test case, it was fixed in a new upstream GCC.


> "I worked on one project where in addition to finding several compiler bugs I also found a flaw in the design of the processor, but that’s an NDA tale for another day…"

Yeah I really want to hear this story; whens the NDA expire or does it never? :)


I wonder if he worked with early UltraSPARC chips. There were tales of an interesting bug report from the NSA. It's been 20 years, so I probably have every detail wrong, but it was something like a missed implementation of a version of ROTL that the CPU was basically emulating at several orders of magnitude performance penalty. No other customer found it.


It would be nice if MSVC will create something like clang-cl [1] to handle --options like others do, for easier compiling of pure Makefile opensource projects.

[1] http://clang.llvm.org/docs/MSVCCompatibility.html


Why use MSVC at all when Clang has a perfectly good gcc-style driver? There's also this http://git.savannah.gnu.org/cgit/automake.git/tree/lib/compi... wrapper script which is useful once in a while for invoking MSVC from inside Cygwin or MSYS (though MSYS2 broke this, last I checked).


We (radare2 project) already being built by gcc, clang and tcc. Every time new compiler support added it reveal some new errors. This is why we have interest in trying MSVC too. Thank you for the script!



For over a decade I've felt that Visual Studio is consistently the best product that Microsoft has released. Which makes sense since many if not most of their developers rely on it to make other product.


Yup, MSVC is the bar that I hold every other IDE to. IntelliJ comes close but all others (Eclipse I'm looking at you) fall far short.


You haven't programmed in C if you haven't found a compiler bug.


I'm currently working to create a project demonstrating a Visual Studio 2015 C++ compiler crash that exhibits only when using the Inline Function Expansion optimization. The Visual Studio team has proven very responsive.


    char key[5] = { 0 };

    Simple enough – this is supposed to zero the entire array, but instead it only zeroed the first four bytes.
In C++ and on the stack "{0}" is not required as memory is zeroized. On heap (in a new) "{0,0,0,0,0}" should be used (or memset()). Not sure about that in C.

My understanding is that it's not a bug.


  #include <iostream>
  
  int main() {
      int nope_not_initialized;
      std::cout << nope_not_initialized << "\n";
  }

  cc1plus: warnings being treated as errors
  In function 'int main()':
  Line 5: warning: 'nope_not_initialized' is used uninitialized in this function
( http://codepad.org/6xrFQZuL )

Edit: And a C version. Amusingly, it will print "0" if I only use one printf statement:

  #include <stdio.h>
  
  int main() {
      int nope_not_initialized;
      printf("%d", nope_not_initialized);
      printf("%d", nope_not_initialized);
  }

  -142791388-142791388
  Exited: ExitFailure 10
( http://codepad.org/TgGfL7QW )

Edit x2: Apparently "implicit return 0 from main" is only a thing in C++? And here I'd assumed it was some silly carryover from C...

( http://codepad.org/JIH9ZteL )


> In C++ and on the stack "{0}" is not required as memory is zeroized.

No, it isn't. Globals are.

And since the explicit initializer is provided it definitely is a bug to not zero all bytes.


Do you only ever run debug builds? Memory not being initialised is one of the first gotchas you learn about in C/C++.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: