This is -- to say the least -- frustrating. First, the busted microcode is still available on the Intel Download Center[1], without any warning that they recommend that you not, in fact, install it. Second, the press release is still being evasive: they have not merely "received reports"; they in fact know that it's causing issues, and the press release is avoiding the much stronger language that Intel is giving privately (namely, don't install this).
The broken microcode is (at some level, anyway) forgivable; Intel's ongoing inability to communicate transparently and honestly with its customers during this crisis of its creation is much less so.
I'm thinking of building an AMD dev box. For enterprise consumers, if they're using 1U or blade servers, they could make the choice to switch to AMD for future nodes.
I strongly recommend that you go AMD. I went all-in on AMD - I agonized over the choice between 8-core Ryzen and 8-core ThreadRipper: ended up with a 12-core TR thanks to steep holiday-season discounts that lowered prices one rung down. TR4-socket motherboards a way more expensive compared to Ryzen ones (same-old AM4 socket).
I know my box is overkill for my needs now, but upgradeability is a big plus for me; I'm only using 16GB of RAM, but could up that to 128GB, and maybe I might swap out the CPU for a 64-core Zen4+ in 2022. For reference, my last dev box is from 2010[1](!) which I upgraded over time and this strategy has served me well. YMMV.
Linux or Windows? I've been doing dev on a large React app recently and the thought of running npm install on Windows makes me anxious about the performance vs my Mac - wondering if Windows has gotten better of late with lots of tiny file I/O.
As an avid fan of AMD going back to the late 80s they have always been a cheaper and better alternative. I am still bitter about rdram in regards to Intel
They really haven't. AMD was so far behind Intel they were in danger of going extinct in data centers. Only very recently have they caught up again to be a credible competitor.
This bug and Intel's response is very good timing for AMD though.
It was Intel’s anti-competitive and illegal actions that prevented AMD from owning the market during the several year period when Opteron was not only the best CPU but the only 64-bit x86 CPU.
Unfortunately the legal process was far too slow and the penalties were a pittance compared to the profits.
It benefits all of us to have a competitive market for x86 CPUs.
Hm? The screenshot was merely showing the good parts (8 cores, 16 virtual).
It's not getting particularly hot in general, entirely depends on the use case. When I max out the cores or run a game? Quite hot. Otherwise: Mostly fine..
It's just unwieldy, big and heavy, hence not really useful on a lap..
I mean, this one, yea. Speculative execution should not have side effects when wrong because it is Intel silently, sneakily breaking the model of how the CPU works (at least, if you only include the cache in how the PC works and not branch prediction).
I would have expected, if I thought to ask, that items were not added to the cache or were removed from the cache if the branch was not retired.
Removing items afterwords probably wouldn't work as you might be able stuff (instead of flush) the cache and figure out which line was emptied.
Intel isn't being sneaky, speculative reading was a standard and accepted feature for out of order processors for over 20 years (remember it affects ARM,AMD,Apple,IBM etc as well). Speculative reading privileged memory while unprivileged was a big mistake though.
Intel's greatest PR success in this mess has been to conflate Meltdown with Spectre. Only Intel is affected by Meltdown because of their design, and it is a more easily exploited bug.
I think that's mainly out of luck. If the exploit had been discovered two years later, the story would likely be different. Apple has been much more ambitious with their ARM processor designs and has shipping iOS and AppleTV products affected by Meltdown.
Shipping or not, it illustrates, that Intel was not unique.
I'm not sure what kind of answer you are expecting. All I am saying is that Intel is not uniquely in the wrong here. There is a whole industry of bad decisions. Whether the decisions were conscious, or only obvious in hindsight I can't say.
"Apple has already released mitigations in iOS 11.2, macOS 10.13.2, and tvOS 11.2 to help defend against Meltdown. To help defend against Spectre, Apple has released mitigations in iOS 11.2.2, the macOS High Sierra 10.13.2 Supplemental Update, and Safari 11.0.2 for macOS Sierra and OS X El Capitan. Apple Watch is not affected by either Meltdown or Spectre." https://support.apple.com/en-us/HT208394
Meltdown is a Variant of Spectre this isn't how Intel classifies it, this is how Google Project Zero, and heck even Intel's competitor AMD classifies it.
It's also not the scariest variant, it's easily fixed (performance degradation aside), doesn't require a microcode update to be fixed hence is 100% software mitigated, doesn't allow you to cross between guest and host memory address spaces and isn't remotely exploitable.
On the other hand variant 1 and 2 are much scarier because they are the complete opposite of Meltdown.
Potentially minimal is probably more accurate. It's workload dependent. In some cases, such as frequent interrupts or system calls on older CPUs without the PCID and INVPCIB features to mitigate the cost, it can be be very expensive.
I don't mean they're literally being sneaky. The point was, from an OS or userland perspective, it should be invisible. Besides performance, it should have no effect because it is literally breaking the CPU model by executing code it shouldn't. It fixes it by not retiring the results, but the bug is in leaving an effect that can be found.
If you had said CPU designers were being sneaky it would be more obvious that you weren't being literal. By saying "Intel silently, sneakily...", it's more personal and seems as if you are being literal. It wasn't really silent either, it was well enough documented that they did speculative execution. Many many very technical and educated people from across the industry knew about this and didn't think it was an issue. They were wrong.
Let's not throw the baby out with the bathwater here. I don't think the problem is that speculative execution is not as invisible as it was once believed. The problem is more of awareness and documentation. If there was an option to disable speculative execution and awareness of the associated security issues from the beginning, I don't think anyone would have a problem with using it for a performance boost where it was safe to do so. The problem is there was an industry wide assumption that it wasn't a problem that turned out to be wrong.
They promise modern process isolation and fail to deliver it. Their fixes reduce performance significantly. IANAL, but that sounds like a defective product.
> They promise modern process isolation and fail to deliver it.
Before one makes such a statement, one has to define "modern process isolation" in a very formal way, so that not anybody (neither Intel nor the customer) can redefine the meaning as they desire. I am not aware that Intel gave such a formal definition that they claim to obey to (but perhaps fail). So any operating system can only rely on very weak guarantees for the processor to provide "isolation" (using quotes since I have not defined the term "isolation" formally). Thus the OS has to implement stronger isolation primitives that it desires by itself (by using the weak primitives that the processor provides).
I've only owned Intel processors my entire life but this crosses a line. I plan to buy my first AMD motherboard/cpu and not look back. I really hope that one day Intel realizes that it's not enough to distract us with new shiny toys. Nearly all of us want solid trustworthy hardware first and foremost.
if you're going to make the accusation that they're lying, at least provide a source - near as I can tell they are being as transparent as is reasonable.
If he is saying that Intel is giving other advice privately then you are welcome not to believe him (and note that it is you who is using the much stronger term "lying" here).
Personally I think un-sourced statements from him are worth listening to.
My general attitude is to presume good faith - both on intels part, and on the part of commenters online. I had no idea who he was till you raised the point with me - had he I identified some source for his assertions (even first had observation on a large number of systems), I probably wouldn't have said anything.
I think it's a bit ironic that the text, in a sense, blames Google for these problems by calling them the "Google Project Zero Exploits" as if Google was some sort of cyber crime syndicate using their evil powers to exploit intel.
Yeah I thought that was interesting too. I think they are most likely name dropping Google just because they want to reinforce whatever type of association with Google that they can get. Lots of people won't register this as a bad thing per se, and will just think "Wow it's super cool that Intel is working with The Googles on something".
As developers, we should know this phenomenon well by now, as it's dictated an ever-increasing portion of our toolchain. "Oh, you say Google uses this thing?! I use it too then! Google and me are best buds!". (This applies equally to Facebook, and to a lesser extent, Amazon. Compare one of my son's favorite YouTube videos at [0]).
Alternatively, they may want customers to think "Oh boy you have to be a super genius guy like the Googles to beat up Intel so this isn't a big deal", or "How could Google do this to a nice company like Intel".
So many possibilities, but really all of them turn out well for Intel.
Intel's spin throughout this has been so scummy that it's hard if not impossible for me to not go with Ryzen from now on. Especially as my Intel CPU keeps rebooting with no fix in sight.
Uhm, what a mess. This, just when Linux vendors began pushing updated intel-microcode packages (Ubuntu just released intel-microcode 3.20180108.0). Should we put the update on hold until this issue is hopefully resolved, or should we still update as suggested in the last paragraph of this Intel press release, somehow believing that the random reboots don't apply to "end users"?
Withdrawn CPU Microcode Updates: Intel provides to Lenovo the CPU microcode updates required to address Variant 2, which Lenovo then incorporates into BIOS/UEFI firmware. Intel recently notified Lenovo of quality issues in two of these microcode updates, and concerns about one more. These are marked in the product tables with “Earlier update X withdrawn by Intel” and a footnote reference to one of the following:
1 – (Kaby Lake U/Y, U23e, H/S/X) Symptom: Intermittent system hang during system sleep (S3) cycling. If you have already applied the firmware update and experience hangs during sleep/wake, please flash back to the previous BIOS/UEFI level, or disable sleep (S3) mode on your system; and then apply the improved update when it becomes available. If you have not already applied the update, please wait until the improved firmware level is available.
2 – (Broadwell E) Symptom: Intermittent blue screen during system restart. If you have already applied the update, Intel suggests continuing to use the firmware level until an improved one is available. If you have not applied the update, please wait until the improved firmware level is available.
3 – (Broadwell E, H, U/Y; Haswell standard, Core Extreme, ULT) Symptom: Intel has received reports of unexpected page faults, which they are currently investigating. Out of an abundance of caution, Intel requested Lenovo to stop distributing this firmware.
It is a mess. I suggest that you and anyone else asking these questions pay attention to Microsoft a little bit, to receive all of the information that there is to be had on this.
Microsoft has been telling people about problems with the mitigations up-front. There are, for starters, Microsoft KnowledgeBase articles detailing problems with older AMD CPUs and with anti-virus softwares that behave like rootkit viruses resulting in systems that will not boot, and web log articles discussing the performance considerations for server systems.
I couldn't accept this as an excuse when you have possibly the worst CPU bug in x86 history, or perhaps all CPU history, with ample of man power and resources, along with 6 months time frame.
I highly doubt this, given how much hardware they appear have lying around to throw at the Linux 0day Test Bot (which does full kernel compiles, boots, and integration tests for dozens of hardware configurations for every patch sent to most LKML lists).
I think all the more jarring is the smiling face of the spokesperson next to this announcement. Atleast can't the announcement be not with a photo or have a serious looking photo of the spokesperson.
The original statement, as phrased by their engineers, probably was something like “Our latest firmware regularly crashes your system, triggering reboots” (plus a few paragraphs with a highly detailed description of why that happened that only the engineers who wrote the firmware would understand)
This is what they ended up with after a few reviews with legal (“we can’t say ‘our’; they’ll eat us in court”) and marketing (“We need a less emotionally loaded way to say ‘crash’”)
Legal aimed to maintain just enough meaning in the statement to be able to say “we warned customers as soon as we could”; marketing aimed to make it a positive message. I guess that’s why ‘higher’ won over ‘more’.
Now, in retrospective, the Samsung battery issue affected only a small portion of users, whereas this will affect every single user in the form of decreased performance.
It raises awareness about issues. Change takes time. Look at the fight of blacks and women for equality in society. We must not stop pointing out the issues, and always demand change.
probably related news from Dell: "NOTE 1: 13G, select 12G, and select DSS server BIOS files have been pulled from http://dell.com/support. This note and article will be updated as soon as more information is available" [1]
the pulled out BIOS update files for 13th gen were released on 5th of Jan.
To be fair, it can well be that some sloppy OEM drivers take too many assumptions on reserved bits in registers (which the new microcode may be legitimately changing) or on undocumented timing side-effects related to some instructions (which the new microcode may affect, being that the root problem in the first place!).
These symptoms are also the classic ones you get when you install an OS on a new-generation, well-functioning CPU.
My comment is no more speculative than those blaming the new microcode for causing reboots. The real bad quality, bad update process and unjustified binary nature of OEM firmware is too often overlooked.
My broadwell (i5-2500k) windows desktop has started blue screening like crazy if I do large continous network transfers (i.e. saturating gigabit ethernet). I didn't have this problem before I rebooted for my most recent update.
I thought it might be memory, but an 8+ hour memory scan (windows internal one, not the normal linux one) didn't tickle any bad bits and its not erorring in any unique component, each time, it seems to be a different one (first I caught the blue screen, it was the network driver, that made sense, so I upgraded it, just in case), but then it started being ntfs and other things. wondering if its just limited to those arches, or others.
you are correct, I actually looked it up, and meant sandy bridge (hence why I wrote "limited to those arches, or others"), but brain farted while writing.
Well this promises to be fun, especially for cloud providers (and those running instances on the cloud, who now potentially get to suffer through host instability)
Intel has made such a complicated product line it's really difficult to figure out what is and isn't affected. As per the Lenovo update mentioned elsewhere in this thread:
"*3 – (Broadwell E, H, U/Y; Haswell standard, Core Extreme, ULT) Symptom: Intel has received reports of unexpected page faults, which they are currently investigating. Out of an abundance of caution, Intel requested Lenovo to stop distributing this firmware."
So far as I can figure, Xeons are covered by "Haswell standard". Core Extreme was those ridiculously overpriced i7s. ULT is the "Ultra Low TDP" chips.
It looks like from the desktop and mobile processor fields, if there is anything special about the core they put a suffix on denoting it, so Xeon may well classify as "Haswell standard"?
“We rushed a patch out and it’s causing problems we don’t understand yet. We’ll rush out an updated patch as soon as we can, so please don’t hesitate to install that.”
"It could be true hardware and software orchestration requires an understanding of how components will work in concert, we apologize we did not rehearse in advance. "
> To avoid the potential for confusion between ironic quotes and direct quotations, some style guides specify single quotation marks for [irony], and double quotation marks for verbatim speech.
I think it was pretty clear from context that this was lighthearted and not a real quotation. Hell, I didn’t even include a source for where the quote allegedly came from.
I just had a hard crash/reboot on a Dell running Windows 10 on an AMD processor, followed by a couple of auto updates. There was no indication of updates being available before the crash.
As I have just written elsewhere on this very page, you need to pay attention to the several KnowledgeBase and web log articles that Microsoft has been publishing on this subject as things develop.
The broken microcode is (at some level, anyway) forgivable; Intel's ongoing inability to communicate transparently and honestly with its customers during this crisis of its creation is much less so.
[1] https://downloadcenter.intel.com/download/27431/Linux-Proces...