Hacker News new | past | comments | ask | show | jobs | submit login
How Screwed is Intel without Hyper-Threading? (techspot.com)
232 points by rbanffy on May 25, 2019 | hide | past | favorite | 218 comments



The very first graph (Cinebench R15) is wrong. i7-8700K does not score 3314 or even 2515 on this benchmark [1]. i7-7700K numbers are also way off. Not sure what happened here, but I can't take any other measurement they published seriously. This is more useful: https://www.phoronix.com/scan.php?page=article&item=sandy-fx...

If you're using an overclockable workstation CPU, the good news is that turning SMT off gives you additional ~200Hz of overclocking headroom. Modern Intel chips are thermally limited when overclocking and turning SMT off makes them run a bit cooler. My i9-7900X overheats over 4.4Ghz, but with SMT off it's comfortable at 4.7Ghz on air (Noctua D15).

[1] https://www.guru3d.com/news-story/intel-core-i7-8700k-benchm...


> I can't take any other measurement they published seriously.

Here's another problem with the article:

> Microsoft is pushing out OS-level updates [...] However, this doesn’t mitigate the problem entirely, for that we need motherboard BIOS updates and reportedly Intel has released the new microcode to motherboard partners. However as of writing no new BIOS revisions have been released to the public.

That's wrong - microcode updates can be successfully applied by either the OS or by the early-boot firmware, you do not need a BIOS update (just apt update intel-microcode).


They probably ran the benchmarks on Windows. Is it possible to customize the microcode version on Windows without a patch from Microsoft?


No. I have to look it up to be certain but I believe that MS did not want to patch CPU microcode during boot and tried to leave that to the firmware. But - I may be wrong here - this stance changed with Spectre and Meltdown and Windows does indeed patch microcode during boot on affected CPUs. However, I don't think there is a way to apply a user-specified patch. The update has to come from MS.


Yes,VMware has a free (as in beer) windows tool to update (downgrading is not possible) the microcode.


> downgrading is not possible

Sure it is. I've done it myself.

Microcode is stored in volatile memory on the CPU. Updates are applied on boot, every boot. "Downgrading" is as simple as not applying updates, or applying an older update.


Perhaps they meant that downgrading is not possible without a reboot.


Loading microcode on a CPU (without patching the firmware) takes effect immediately. The update is lost immediately upon reboot and must be reapplied each time.


Right, but can you load an older microcode after a newer microcode has already been loaded, without rebooting?


> the good news is that turning SMT off gives you additional ~200Hz of overclocking headroom

That'd be 200 MHz, not 200 Hz?


Not on my MOS 6502!


The concept of a hyperthreaded 6502 is intriguing, and I wish to subscribe to your newsletter...


The 6502 runs at around 1MHz so even on that 200Hz would be irrelevant.


There's some 6502 compatible cores that are around 200MHz these days.


Good catch, yes. :)


There is an R20 8700k score listed here which is close to the one quoted in the Techspot article (3372 vs 3314):

https://www.geeks3d.com/20190306/tested-cinebench-r20-cpu-be...

So Techspot numbers are fine, just a typo on version of benchmark, should say R20 on graph not R15.


Just note that the perf loss is 20-40% in their benchmarks, so in your case, you'd need to bump that 4.4Ghz to 5.5-73.GHz, which is a tall order.

In any case, the article is worst-case FUD-like nonsense. No idea what the eventual perf loss will be, but it's a safe bet it'll be less than that.

Consider that it's likely safe enough for 99% of usages for the OS to allocate CPU cores to processes in pairs, and that would incur only a small penalty. And something smarter is almost certainly possible, too.


The text around it says Cinebench R20 too. Guessing both were tested, and the wrong data set was pasted into the article.

  "First off we have Cinebench R20 results ..."


Bingo. Hyperthreading does not pay for itself, overclocking. Period. End of discussion. I turn it off, or buy chips without it.


Overlocking is not an option on 99% of laptops.


Another micro-architecture attack. Since the advent of Spectre and Meltdown, I really wonder what is the practicality of exploiting these vulnerabilities. As an end-user, if I have malware running on my machine trying to trigger these exploits, then in many ways I have already lost. The malware program has access to all my personal data anyway.

Personally I wonder whether the cost of mitigation is worth it. According to the article (and their simplified HT methodology) certain workloads experience a 25% performance hit.

The only cases I currently consider as exploitable are VMs in the cloud (perhaps a reason to own dedicated servers after all this time) and running JS in the browser (perhaps a reason to disable JS).

There will always be side-channel attacks. Our devices are physical devices and will leak side-effects, like electromagnetic radiation ( https://en.wikipedia.org/wiki/Tempest_(codename) ). This recent spate of MDS flaws don't necessarily fit in my threat model.


I feel like there is an impedance mismatch between what CPU designers think memory protection guarantees and what developers think memory protection offers. For production code, I never got much higher level than “C with classes” and if you asked me 15 years ago if leaking bits of information through the branch predictor or memory buffer was a failure to conform to the x86 memory protection model I would’ve said no. Memory protection is mainly there to keep one app from crashing another. If you’ve got untrusted machine code running on your system, you’ve already been compromised. I feel like CPU designers are still in that mindset. They still measure their success by how well they do on SPEC.

Maybe instead of slowing everything down for the sake of JavaScript, we need a special privilege mode for untrusted code that disables speculative execution, sharing a core with another hyper-thread, etc.


> If you’ve got untrusted machine code running on your system, you’ve already been compromised.

You are exactly correct. However, the advent of "The Cloud" changed that.

"The Cloud" by definition runs untrusted machine code right next to yours. IBM and other players have been screaming about Intel being insecure for decades--and it fell on deaf ears.

Well, suck it up buttercup, all your data are belong to us.

While I hear lots of whingeing, I don't see IBM's volumes rising as people move off of Intel chips onto something with real security. When it comes down to actually paying for security, people still blow it off.

And so it goes.


Why should people move to IBM? Remember, their POWER processors were also vulnerabile to both Meltdown and L1TF - IBM systems are probably the only non-Intel servers that were. (Note that I really do mean non-Intel here, since AMD wasn't affected.) Their z/OS mainframes were probably vulnerable too but they don't release public information security about them. The only reason no researchers had discovered this is that IBM hardware is too expensive to test.


Red Hat released info about z/OS mainframes, they're also vulnerable to Meltdown as well as Spectre. ARM also has one new design that's vulnerable to Meltdown, and everyone has Spectre vulnerable designs.


I kind of blame Google for creating the whole trend of hosting all this mission critical stuff on x86 PCs: https://www.pcworld.com/article/112891/article.html. (Altavista was run in DEC iron. Google pioneered running these services in disposable commodity hardware.) That being said, POWER got hit with some of this stuff too.


Speaking about Google, how vulnerable are they, and how many CPUs will they need to replace?

Did someone already demonstrate that speculative-execution bugs are observable in Google's cloud stack?


[flagged]


I think you're being a little harsh.

These things end up having unintended consequences, it isn't about 'fuck Google', its about identifying the root cause of a problem. X86 PCs come from a long line of non multi user, non multi tasking computers. Whereas DEC mainframes are perhaps the more natural choice for what Google wanted to do.


So, let me get this straight.

You're saying Google should have foreseen spectre et al 20 years ago and therefore should have used DEC mainframes as it's infrastructure?

And further it's all google's fault that the cloud uses x86 infrastructure?

Wat?


No I said its not about 'fuck Google', its about identifying the root cause of a problem.

I didn't forsee this, and I don't recall anyone else predicting it, so no I don't think Google should have foreseen it either, but nevertheless it has happened, so we should endeavour to understand why, so it doesn't happen again. It isn't about blame, it isn't about pointing fingers.

Now we've identified an issue, then the next time an industry moves from big iron the commodity X86 PCs we can ask the question, is this going to be a problem?


I think he or she is drawing an analogy between Intel and Google both "cutting corners" to save costs, which worked well for them in the short term but had unforeseen consequences for everyone else over the longer term. This could be an instance of the famous "tragedy of the commons".


If DEC had won, would we have the same issue?

"In some cases there’s non-privileged access to suitable timers. For example, in the essentially obsolete Digital/Compaq Alpha processors the architecture includes a processor cycle counter that is readable with a non-privileged instruction. That makes attacks based on timing relatively easy to implement if you can execute appropriate code."

I still call bullshit on his entire hypothesis.

https://hackaday.com/2018/01/10/spectre-and-meltdown-attacke...


Hopefully you'll agree that there is a world of difference between an electromagnetic side-channel and something that can be achieved by simply running some JS.

In particular, disabling JS would be pretty disabling for an average modern web user, so an easy, practical attack through this vector is especially relevant.


it would rule if we moved away from javascript. it’s turned the web, the world’s best publishing platform, into the world’s crappiest app platform. good riddance.


the internet wouldn’t be as popular as it is today if it were not for it running apps. until some better language comes out supported by all browsers we won’t be moving away from JS. this “remove JS” horse is long dead, can we stop beating it now?


Yet every time I ask my web developer friends to ensure that functionality works online without JavaScript for basic things I am met with hostility.

If we were able to run without JavaScript and have basic use of websites again then these attack vectors wouldn’t be so scary.

Maybe it’s a personal failing of mine; but I don’t understand why people don’t consider JavaScript untrusted code when other vectors of running software on my machine come from a gpg-signed chain of custody.


Well, you would have to architect an app from the ground up to be functional with/without javascript and test any new functionality with it off. You’re talking 2x the work to appease maybe .0005% of the web browsing public. I wouldn’t be hostile if you asked me to do that... but I wouldn’t entertain the idea seriously either.


If you build using server side rendered HTML then progressive enhancement with JavaScript is not actually that difficult. It takes a different mindset that most webdevs don't have. Getting the UX nice for the fallback is hard.


Yes _if_, but many websites have moved on to client side rendering because if done right it delivers a better user experience for the 99% of users that have JS turned on, because there is no latency between page transitions.

Sure, passive content such as nytimes.com can work without JS (although some of their interactive articles would not), but anything more complicated will often be done with client side rendering these days.


> no latency between page transitions

Not true, latency scales with CPU capacity on the client. SPAs now exist with client side rendering to mask the bloat in the site plus its dependencies.

If you have a SPA arch but you did a page load per click, you would die. But all sites creep up to 500ms page load times regardless of their starting stack and timings.


It's still almost 2x work for every feature, because you need to implement it, test it for both modes and keep maintaining it for both modes. Usually people do that for Google Crawler. But recently it learned to understand JavaScript, so even that argument is moot nowadays. Your best hope is to wait until browsers decide that JavaScript is harmful and won't enable it by default without EV certificate (like they did with Java applets back in the days). I don't see that happening, but who knows.


It wouldn't be 2x the work if this was standard design since common problems/hurdles and their solutions would then had already been mapped out. It pays off to use the least powerful language needed to accomplish a goal.


Your right its not 2x the work, it is much more.

Speaking from about a decade of experience with progressive enhancement and all the other things. It is 'much' more. There is an expectation of equivalent functionality/experience in these cases, and you just can't spread your resources that thin to get half a product out that works without Javascript. You're literally development a completely independant application that has to sit on top of another application. Everything comes out 'worse'

These days we invest in ARIAA and proper accessibility integration and if you run without Javascript that is going to get you next to nothing in a 'rich' web application.


A possible alternative is to instead disable JIT for JS. The only reason the speculative execution attacks so far worked via JS is that we got a tight, optimised native code with specific instructions close to each other.

Interpreting the code instead should completely kill any chance of exploits of this class. It will also completely kill the performance though. Even tor dropped that idea https://trac.torproject.org/projects/tor/ticket/21011


When I was running Gentoo Hardened in the old days, there used to be a "jit" USE flag for Firefox, which could be disabled before building it. I was running PaX/grsecurity and I was suspicious of JIT in general as it requires runtime code generation, and breaks the excellent memory protection from PaX, so I kept it disabled. The performance was terrible, but tolerable for most websites. The worst part was in-browser JavaScript cryptography, without JIT there's zero optimization, a web login takes minutes.

But later on, Firefox has streamlined its architecture and made it more difficult to disable JIT. After one update, disabling JIT causes a broken build that crashes instantly. I've spend a lot of time reading bug trackers and looking for patches to unbreak it. The internal refactoring continued, after Firefox Quantum, JIT seems to be an integral part of Firefox, and the Gentoo maintainers eventually dropped the option to disable JIT as well.

I wonder if an ACL-based approach could be used, as an extension to NoScript. The most trusted website has JavaScript w/o JIT, the less trusted website has JavaScript w/o JIT, the least trusted website has no JavaScript. Tor Browser's security lever used to disable JIT at the "Medium" position. But rethinkig it, this approach has little security benefit, it's easy to launch a watering hole attack (https://en.wikipedia.org/wiki/Watering_hole_attack) and eject malicious JavaScript from 3rd-party elements.

I wonder if extending seccomp() and prctl() based approach can be a solution. SMT can be enabled but no process is running on a SMT thread by default. Non-confidential applications such as scientific computing or video games can tell the kernel to put their processes on SMT threads.


> I wonder if extending seccomp() and prctl() based approach can be a solution. SMT can be enabled but no process is running on a SMT thread by default. Non-confidential applications such as scientific computing or video games can tell the kernel to put their processes on SMT threads.

A valid option, though in general I'd rather allow it for everything except browsers.


Because all code is untrusted, just because you know who it probably came from doesn't mean there aren't bugs, backdoors or exploits (code review can catch only so much). That goes triple for the average user who doesn't even understand the first thing about security.

So instead of trying to achieve the impossible (perfect safe code that still has unlimited access) the direction is stricter sandboxes. Then at least you only need one near perfect piece of safe code (the sandbox) instead of tens of thousands.


What you're saying sounds nice, but it seems to come from a world before Spectre, Meltdown, and all the new discoveries since. These have basically shown that it is impossible to build this sandbox on the modern processors that everybody uses from desktops to cloud data centers.

Instead, the only way of having a performant, secure system today is to disable hardware mitigations and ensure you only run trusted software, the opposite of your proposal. The sandbox still helps for other issues (e.g. buffer overflows).


> If we were able to run without JavaScript and have basic use of websites again...

I think that ship has sailed..

We should work to build secure sandboxes instead.


Sandboxes don't stop side channel attacks


Yes, it seems we have a shared memory problem. Shared Cloud servers and client-side JavaScript seem under duress.


You are met with hostility because what you are saying is absurd. I don't think any reasonable person wants to bring the web back to what it was in 1999. Lots of things about the web have gotten worse, but same could be said for automobiles. The collective usefulness JS brings to the browser is far more valuable than the problems it creates.

Also, to keep this all in perspective, we are talking about one company that made egregious engineering decisions to maintain their market leadership which has put their customers at risk. To say we should do away with JS in the browser because Intel made some really poor decisions to undermine their competition is just crazy. As far as I'm concerned, this is typical Intel and they are getting what they deserve yet again. I just feel bad for their customers that have to deal with this.


> Also, to keep this all in perspective, we are talking about seven companies that made egregious engineering decisions to maintain their market leadership which has put their customers at risk.

Fixed it for you, the current list I have is AMD, ARM, Freescale/NXP, IBM both POWER and mainframe/Z, Intel, MIPS, Oracle SPARC, and maybe old Fujitsu/HAL SPARC designs for Spectre, with at least four of those CPU lines also subject to Meltdown.


You know you are misrepresenting the issue here. The current batch of vulnerabilities does not affect AMD except for the fact that the OS patches affect all cpus, vulnerable or not. Needing to disable hyperthreading on Intel CPUSs is the catastrophic situation I’m referring to... up to 40% loss of performance in thread intensive tasks.


The current set of vulnerabilities target Intel specific microarchitectural features, like the first version of Foreshadow/L1TF targeted the SGX enclave, which by definition isn't in other company's CPUs.

Given that AMD is fully vulnerable to Spectre, there's absolutely no reason to believe it isn't similarly vulnerable to microarchitectural detail leakage if people were to look. And going back to what I was replying to:

> we are talking about one company that made egregious engineering decisions to maintain their market leadership which has put their customers at risk

We demonstrably aren't, seeing as how ARM, and IBM, both POWER and mainframe/Z CPUs are also vulnerable to Meltdown. That and the significant prevalence of Spectre says this is not "typical Intel" but "typical industry", a blind spot confirmed in 7 different companies, and 8 teams to the extent the IBM lines are done by different people.

The "Intel is uniquely evil" trope simply doesn't hold water.


Fair enough, I don't know enough about the performance hit mainframes, ARM and IBM cpu's are taking to say if it's similar to what Intel is experiencing or not.

That said, in the consumer space, being (this) vulnerable to Javascript attacks is catastrophic. My original point is that we should not be crippling something very useful (javascript in the browser) because of flawed architectures that mostly affect one company in a way that decimates performance.


> My original point is that we should not be crippling something very useful (javascript in the browser) because of flawed architectures that mostly affect one company in a way that decimates performance.

Lots of us have a different opinion on the usefulness vs. risk of running random and often deliberately hostile JavaScript in your browser, see the the popularity of NoScript, and how many of us use uMatrix with JavaScript turned off by default. Most of the time I follow a link where I don't see anything, I just delete the tab, most of those pages aren't likely worth it.

"Mostly affect one company" is something completely unproven, since AMD is not getting subjected to the same degree of scrutiny, AMD has a minuscule and for some reason declining in 19Q1 market share for servers, while desktop and laptops are modest but showing healthy market share growth: https://www.extremetech.com/computing/291032-amd-gains-marke...

While ARM is announcing architecture specific flaws beyond basic Spectre: Meltdown (CVE-2017-5754) and Rogue System Register Read (CVE-2018-3640, in the Spectre-NG batch but by definition system specific): https://developer.arm.com/support/arm-security-updates/specu...


> popularity of NoScript

This may be popular among neckbeards, but regular people could care less. Regular people care about being tracked and that's about it.

> "Mostly affect one company" is something completely unproven

You speak like a person of authority on this subject, but AMD has come out and said they are specifically immune to these threats: https://www.amd.com/en/corporate/product-security

AMD has "hardware protection checks in our architecture" which disputes your assertion that AMD just isn't a targeted platform. The reality is any computing platform can be vulnerable to undiscovered vulnerabilities so making that point is kind of pointless.

Also, disabling JS on the browser pretty much completely eliminates e-commerce on the web. Again, I can't fathom the masses seeing any benefit in this.


it is considered untrusted code, and that’s why the bowsers VM is so locked down that one can’t even access the FS. visiting a website is akin to installing an application, you either trust it and visit the site or don’t and don’t visit.


Maybe it wouldn't be so scary for you, but is everybody else's computer is compromised, did you really win much?


Is there PoC website where I can get hacked via JavaScript exploiting those side-channel vulnerabilities? For me it sounds too theoretical.



According to that article, browsers already implemented mitigations. It's not clear whether BIOS and OS mitigations are necessary for that.


The fact that disabling arbitrary machine code execution would be “disabling” for the modern web is proof of how totally screwed up the whole thing has become.


And the fact that this observation doesn't have tech people marching in the streets.


No kidding, JS off by default and not running code you don't trust will always be a good idea. I also agree that I'm not really concerned about exploits that require running code on my machine; if the latter happens, I have far more serious things to worry about.

The exploits that do worry me, are ones that can be done remotely without any action from the user. Heartbleed is a good recent example. Fortunately those tend to be rare.

Security is always relative to a threat model, and not everyone wants perfect (if that can even be possible) security either, contrary to what a lot of the "security vultures" tend to believe.


Supply chain attacks become a lot easier to pivot on. Put some Spectre code in a npm or RPM package, then wait to see what pops up. So much stuff is sudo curl pipe bash these days that the Spectre threat is real.

All the more reasons to run your own servers and compile your own binaries. Like we used to do.


Why do you need Spectre for that? Just install backdoor. It's not like someone runs npm from restricted user.


I thought it was standard practice for framework package managers to run as non privileged users and install binaries in local dirs.


Never saw that. You're typing npm install and npm runs under your current user (probably not root, but who cares about root when valueable data belongs to you, not root) and runs any package install scripts it just downloaded from npm website. There's no separate user to run npm, at least in default installs.


You mean “compile your own binaries” using code which you downloaded without auditing, just like NPM? That’s actually what we used to do; blaming NPM is both reflecting a misunderstand of the problem and blaming the update system which means you can fix a problem orders of magnitude faster — the half-life of random tarballs people used was much longer.


Who said without auditing? There are a plethora of signing and hashing mechanisms one can use to verify a package's authenticity.

Compiling once from a tarball and reusing that can definitely reduce the number of times you would need to trust something from a third party.


You are aware that NPM already does that, right? It’s even safer because the network requires immutability so there’s no way to trojan a package without shipping a new version for everyone to see.

The real problem is why I mentioned auditing: the attacks we’ve seen over the years have been updates from maintainers who followed the normal process. Auditing is the most reliable way to catch things like that because the problem isn’t the distribution mechanism but the question of being able to decide whether you can trust the author.


Put some Spectre code in a npm or RPM package, then wait to see what pops up.

This is fucking scary. If it's a package used by Wordpress you could end-up with 30% of the web open to an attack.


Intel's doc [1] if SMT is not disabled, not only does the OS need to perform group scheduling (only allow threads that trust each other to run on the same core at the same time) but the OS also has to interrupt the sibling thread running on the same core whenever user mode code enters kernel mode, synchronizing syscalls. For compute-heavy applications, this might be ok, but for servers doing a lot of syscalls this might be terrible.

1 - See "Synchronized ring 0 entry and exit using IPIs" in https://software.intel.com/security-software-guidance/insigh...


I haven’t seen a Linux implementation of this yet, but I suspect the performance is beyond terrible. Intel’s doc says:

> There is no hardware support for atomic transition of both threads between kernel and user states, so the OS should use standard software techniques to synchronize the threads.

Ha, “standard software techniques”. Let’s see: suppose one thread finishes a syscall. Now it sets some flag telling the other core to wake up and go back to user code as well. But the other core is running with interrupts on. So now we sit there spinning until the other core is done, and then both cores turn off interrupts and recheck that they both still want to go to user mode. This involves some fun lock-free programming. I suppose one could call this a “standard software technique” with a somewhat straight face.

Meanwhile, the IPI on entry involves an IPI and the associated (huge!) overhead of an x86 interrupt. x86 interrupts are incredibly slow. The IPI request itself is done via the APIC, and legacy APIC access is also incredibly slow. There’s a fancy thing called X2APIC that makes it reasonably fast, but for whatever reason, most BIOSes don’t seem to enable it.

I suspect that relatively few workloads will benefit from all of this. Just turning off SMT sounds like a better idea.


Much like many late model intel cpu features x2apic was buggy at launch and so people disabled it by default.


The article says that disabling hyper-threading is a worst case scenario, but I don't think it is.

Intel themselves said [1] "Intel is not recommending that Intel® HT be disabled, and it’s important to understand that doing so does not alone provide protection against MDS."

[1] https://www.intel.com/content/www/us/en/architecture-and-tec...


Indeed. In order to protect against MDS on Intel chips, you need to disable hyperthreading and apply microcode and OS updates which have their own additional performance penalties on top of that.


If my understanding of the announced updates [1] is correct, the need to update exists for non-HT CPU, too (see C3000 series and such).

[1] https://www.intel.com/content/dam/www/public/us/en/documents...


Honestly the cure is worse than the disease with these “vulnerabilities.” How do I opt out of the 10-25% performance hit for an extremely speculative vulnerability?


On Linux you can just ignore the intel-ucode package (or whatever it's called) and not update your CPU microcode. BUT, I think they also have mitigations in the kernel, so you would have to not update your kernel, or somehow remove the patches.



The CPU microcode updates are also delivered through motherboard firmware updates, so you'd have to avoid those too. But some of the microcode updates permit more efficient mitigations, so unless you can completely avoid software updates with mitigations, you might actually want some of the microcode updates.


For most of the mitigations there are kernel parameters to disable them if you wish to do so


In Linux you can disable mitigations with kernel parameters.


“Speculative” does not mean this isn’t a vulnerability. It is describing execution that happens when the CPU guesses which code will execute after a condition. There is a vulnerability with speculative execution, not a speculative vulnerability with execution.


Agreed, I think the nuance here is that in compsci terms "speculative" tends to mean "ok, so I can prove that architecturally that this is 100% possible. I simply lack (either) the labor hours needed to write code (or) the hardware/money resources to get this in place" within the bounds of infosec these types of proofs are taken very seriously on the principle of "if I figured this out with my shitty home lab people with serious resources can't be far behind" additionally responsible disclosure in this space generally means that originators of vulns usually get an undisclosed first shot at fixing problems like this and they only get published wide in this way when the fix is so serious and will require so much time that companies disclose them to avoid being held legally responsible for fraud at some later date.


To be clear, this isn’t that type of speculative but rather https://en.m.wikipedia.org/wiki/Speculative_execution


Don't apply updates. For desktop computers the risk is practically non-existent. Servers and VPSs are the ones taking the hit.


I think this is bad advice, yes there is a performance hit but chances are any eventual malware (which lets be honest if it doesn't already exist is being written as we speak) that exploits these vulns will in its execution probably impact your system with a negative performance hit.

The "don't apply the updates" argument in security is basically philosophically comparable to the antivax movement


> The "don't apply the updates" argument in security is basically philosophically comparable to the antivax movement

As long as you also apply the quarantine principle when relevant. And herd immunity, both applied backwards:

Quarantine for when you run code that's under your control, like an HPC cluster; the biggest dangers from these vulnerabilities are when you're running completely random code supplied by others, JavaScript in browsers for normal users, and the multi-tenant systems of cloud providers.

Herd size matters because the bigger the herd, the larger the number of potentially vulnerable systems. IBM mainframe chips are vulnerable to Meltdown and Spectre, but there aren't hardly as many of them as x86 systems. Although the payoff of compromising them is likely to be disproportionately high.


I still want the fixes for actual vulnerabilities though.


I can't find any recent news about class action suits against Intel. It seems everything is from early 2018.

Is this situation so unexpected that major companies don't even know how to react? If I were Amazon or Google, I'd be requesting my 30% discount, retroactively. Perhaps they are?


Probably a condition for getting that discount is to promise to not make it public.


On a sidenote, does anybody else find it interesting that all the benchmarked games running on Vulkan have only minimal performance hits?

IIRC Vulkan was designed to be more suited to modern multi-core processors, and I believe that the "HT-agnosticism" it exhibits is good proof that they managed that pretty well...


I think it's more likely that the particular Vulkan games they tested are GPU-bound, so a slower CPU won't have much of an effect on their framerate. I mean Doom even runs at 60 fps on current-gen consoles' low-powered CPUs.


Maybe they are GPU-bound precisely because the CPU-intensive parts are so well done with Vulkan?


I don't think so. DX12 has the same performance characteristics as Vulkan, yet the DX12 games tested show performance degradation when HT is disabled but Vulkan games do not. As I said, I think it's because the particular Vulkan games tested happen to be more GPU-bound than the DX12 ones.


Does this mean that the vulnerability doesn't affect CPUs that don't have Hyperthreading (e.g. the 9700K - 8 cores/8 threads)?


How realistic are these attacks for every day desktop users?


If we want securely-isolated low-horsepower machines, why don't we just fill server-racks with large numbers of low-horsepower physical computers, rather than small numbers of big-iron computers using software isolation in the form of VMs?

Wouldn't that win half the battle, at least on server-side? It wouldn't apply to workstations, of course. Malicious/untrusted processes running there would still need to be isolated (malicious JavaScript in the browser, say).

Also, am I missing something, or are these concerns really not applicable to gaming applications (which the article emphasises)? That's a highly trusted codebase, so securely containing it isn't worth a 15% performance hit, no?


The cloud is only able to be cost effective because they are able to utilize those fleets of big-iron computers to provide nearly infinite VMs and resources to anyone who signs up for their services. Relegating each customer to their own physical computer would raise costs so high as to essentially eliminate the purpose of the cloud's existence, and most businesses would be better off going back to their own data centers at that point.


> Relegating each customer to their own physical computer would raise costs so high as to essentially eliminate the purpose of the cloud's existence

plenty of places rent bare metal machines, and the price doesn't seem quite that world-ending. see packet.net for instance.

and it only gets better if on the bare metal machine you get to turn off all of the performance-draining fixes.


It's actually much cheaper than cloud. It doesn't have the scaling flexibility, but that is partially compensated for by running faster, lower latency, etc. Lots of people use Cloudfront etc but still host on their own bare metal.


Isn't it still 'cloud'? It's still on a server somewhere that I don't think about, which is really all 'cloud' tends to mean.

I don't care whether Amazon use virtualisation to get me my 'instance'. They're always careful to stick with 'instance', and never to commit to 'VM'. Wise move, now that they're offering physical ('dedicated') instances [0]. Time will tell whether they take an interest in low-horsepower dedicated instances.

[0] https://aws.amazon.com/ec2/pricing/dedicated-instances/


There is a non-smooth transition from a server you own/lease and sits in specific rack in a known data centre with well understood routing and one provisioned as a dedicated instance in AWS/Azure etc that you manage with an API.


For certain kind of workloads renting bare metal is waay cheaper than VMs. What the small time bare metal providers cant compete with is all the different services sold along with the VMs. If it was just price per VM I suspect bare metal renting would be a lot more cost efficient in a lot of scenarios.


The other area they can't complete is on variable or disaster recovery scenarios, the sort where you'll reserve CPU instances, but only spin them up for your heavy load parts of the day or when you have an unexpected migration. And when you're not using them, AWS puts them on their spot market.

They're probably a bad match for when you have a single batch of work to do, like when the New York Times converted their archives, or every time Netflix has to reencode their video library.


> Relegating each customer to their own physical computer would raise costs

I'm not convinced the cost difference would be a showstopper. More little computers rather than few big computers. Same total horsepower. More motherboards, sure.


More boards, more rack space, more hardware management overhead, more and longer wiring with associated losses from resistance, bigger buildings... it's not really simple.


> More boards, more rack space

Not if they can be made smaller and packed more densely.

> more hardware management overhead

I'm not sure this is a significant issue.

> more and longer wiring

I don't think this follows. They could be clustered. A cluster could be made to look a lot like a conventional big-iron server.

We're talking about replacing a single large machine running, say, 16 VMs (each with an IP), with 16 small machines (each with an IP). Put an Ethernet switch at the last-mile and it's virtually the same, no? Different number of MAC addresses, sure.


More machines means more power to cool them. Electricity bills are a big chunk of the cost of running a datacenter.


I was just discussing this with a friend. Obviously there's a certain degree of multi-tenancy sharing that's most cost effective. For electrical power use and removing heat there's a floor where, if not being used, the best you could do would be to turn a system off, which has its own issues. Go too far and you won't be able to also run a spot market in CPU instances like AWS does.


> For electrical power use and removing heat there's a floor where, if not being used, the best you could do would be to turn a system off, which has its own issues.

What issues? If your cloud architecture can't save power when demand is low, how can that be a positive?

> Go too far and you won't be able to also run a spot market in CPU instances like AWS does.

As I understand it, a spot market isn't ever an architectural goal, it's a way to exploit the practical realities of unused resources/capacity. If that necessity goes away, that'd be a win.

I see at least 3 possibilities:

1. The spot market is about filling temporarily-vacant VM slots on host physical machines

2. Additional physical machines must be kept online so that VMs can be quickly spun-up, and the spot market is about capitalising on this necessity (this is really a variation on Option 1)

3. The spot market is about deriving at least some revenue from the unutilised fraction of an always-on fleet, encouraging customers to run heavy jobs during times of low load on your cloud

I doubt Amazon's fleet is always-on -- I imagine they have fairly significant capacity in reserve to keep up with growth and spikes -- so I doubt it's Option 3. (This is a perfectly obvious question but I really can't find a definitive answer. Maybe my Google-fu is weak today.)


I think Amazon’s hands are mostly tied when it comes to powering off machines. Many VMs are run indefinitely, so whatever machine they’re on has to stay powered on. They may have live migrate (I think they do, but no official word), but they can’t be too aggressive about that or customers will notice.


I stumbled across a 2012 article on this [0] that says they'd 'retire' instances. Wouldn't force the VMs over to another host, they'd just terminate them. They'd generally give forewarning.

I wonder about live migration. I imagine there must be some brief interruption even there.

> Many VMs are run indefinitely, so whatever machine they’re on has to stay powered on

Amazon must be adequately motivated to find a way to power-down unused capacity. My guess is they have some kind of system to maximise this.

[0] https://www.theregister.co.uk/2012/12/20/aws_ec2_servers_ret...


Yeah, I’ve experienced several instance retirements. Most with a fair bit of warning, one a few minutes after the instance stopped responding. I haven’t gotten a retirement warning in several years, so it seems they're doing something to mitigate hardware failures, although my infrastructure is small enough it could just be luck.

Plus Xen has live migrate now, so I assume AWS’s frankenxen would have it as well.

There is indeed a brief interrupt, which is why I said they can’t be too aggressive with it.

Anyways, I’m sure Amazon does what it can to power down machines, I just think they’re stuck leaving much the data center running because they can’t kick off running VMs.


> More machines means more power to cool them

That doesn't sound right. A large number of weak computers, should be easier to cool. A cluster of smartphones can get by with passive cooling, but my laptop cannot.

Apart from cooling, I'm not an expert on this but I don't know that performance-per-watt has to improve as overall power increases. Going with my idea you'll need to power more motherboards, but they'll be less powerful motherboards. (The total number of CPU cores being powered remains the same. Same goes for RAM modules.)


> A cluster of smartphones can get by with passive cooling

Is that true? My phone gets quite hot if I run a taxing load on it for too long. My, perhaps poor, intuition tells me that phones can rely on passive cooling because of the burst nature of phone use. If you’re utilizing your smartphone cluster, I can imagine your fleet overheating.


> If you’re utilizing your smartphone cluster, I can imagine your fleet overheating.

iPhones survive long sessions of 3D gaming even when wrapped in protective plastic shells. If it was an issue, use a bigger passive heatsink than the one they cram into the smartphone form-factor. You could never do that with a server-grade CPU.

With a high-power CPU, the heat emissions are more concentrated than if we use many low-power CPUs.


My iPhone X definitely gets uncomfortably hot after less than 30 minutes of playing PUBG, and I don't even use a case. To illustrate my point, there are "gaming cases" with fans on them, so it's enough of an issue for companies to market solutions at it.


Ok, let's assume the iPhone's cooling system -- i.e. the metal back of the phone -- isn't adequate for long-running heavy loads. I don't think there's a deeper point behind this. Fit a proper passive heatsink and you're all set.

The Raspberry Pi would have been a better example.


What's expensive, and some time ago became a significant limiting factor in densely packing machine rooms, is removing heat from the room. As long as you can supply cool air to the racks they can take care of removing heat from motherboards and other components.

If cooling systems for machine rooms haven't caught up with modern systems that dissipate more heat, the direction you're thinking has merit. But I doubt there are serious limiting factors here, besides cost.


> If cooling systems for machine rooms haven't caught up with modern systems that dissipate more heat, the direction you're thinking has merit. But I doubt there are serious limiting factors here, besides cost.

I think we agree, but I wasn't really thinking of power-savings as the real motivation here, I was thinking of security.

It seems likely we're going to see a steady steam of Meltdown/Spectre-like security concerns. We can have our cake and eat it -- isolation and all the CPU cleverness there is -- if the isolation between instances is done physically rather than with VMs.


If you go with older CPUs you're gonna have higher loads to execute the same work. Which means more heat.


You'd be right, but I didn't say older CPUs. Take modern server CPUs and reduce the number of cores. May as well go all the way down to single-core.

There's clearly a market for weak, cheap, machines -- Amazon didn't originally offer a 'Nano' instance-type, but they do now.


How much do these problems actually affect normal users tho? AFAIK not a single exploit has been created using meltdown or spectre, and why would there be?


For 10-15% FPS I'd take any vulnerability any day of the week. I'd easily accept wiping my machine every night and never browsing the web on the same machine if that's what's required. I can just get another machine for whatever other task I have and it's pocket change compared to the gaming machine.


There might be a way to block the patch, but having a computer strictly for gaming makes you a bit of an outlier.


It would yes, but my argument is this. Anyone who has a modern ("AAA" gaming machine) has probably paid a fair chunk of money for it. For a high end one, maybe $2-3k. Now consider how much of that money went into the last 10-15% performance of the machine. There are significant diminishing returns. It's for example the difference between a mid-range i5/i7 and the highest end i9 on the CPU end, or the difference between a 1660ti and a 2070 on the graphics card side. It's often several hundred dollars that went into the last 10%.

So: even if I built that machine to not be exclusively used for gaming (which would be an outlier), anyone who has a machine of that kind could easily motivate making it exclusive for gaming, because there is so much money in that 10-15% that we can buy a second much cheaper machine (e.g. a low end laptop, chromebook etc) to use for other things.


an outlier, but not a unicorn. Dedicated machines running RetroPie for gaming is very much a thing. Probably most people doing this use a RPi (so not Intel), there's a lot of us buying old OptiPlexes and stuff.


People running RetroPie are unicorns of unicorns in terms of the people using computers. Hell even people gaming are unicorns in terms of overall computer usage.


I disagree. I can think of at least three use cases that frequently employ Intel CPUs, and that involve running a limited set of low-risk software:

* Gaming-only machines, like my RetroPie example.

* Media servers, like for Plex or Kodi.

* NAS servers, powered by software like UNRAID.


But those uses are still smaller than rounding errors in terms of overall usage.


It's big enough to create whole markets for commercial software, as my examples of Plex and UNRAID show.


My only Windows machine is only used for gaming. It doesn't even have a web browser. It's also on its own VLAN, so it can't talk to my other machines.


How did you manage to rip out IE/Edge from Windows ?!?



Can anyone explain attack vectors? How does this happen?


So does this mean that I will get a performance hit with the comming windows update?

Is there any way to mitigate this performance drop, disable the fix?


Interesting that they measure FPS instead of frametime. FPS doesn't scale linearly, but frametime does.


Pretty cool business model where putting dangerous vulns in your product can help you capture 90% of the market.

Then when your fuckups are discovered, other people not on your payroll expend their own resources to fix it.

And you don't even have to compensate customers who paid full price but now find 40% of their compute evaporated.

And that massive performance hit needed to fix your fuckups stimulates incremental demand for your product because customers now need more cores to finish their compute.


>Unlike previous speculative execution flaws that partially affected AMD and Arm-based processors, the MDS flaws are exclusive to Intel chips.

This isn't an excuse for Intel and how they (sometimes poorly) handle issues that come up. And it is well known that many of these vulnerabilities are currently specific to Intel and how their subsystems are implemented, or only exist on Intel processors. But side-channel attacks are a nearly universal vulnerability.

SMT of which Hyper-Threading is an implementation of is used in many chips from many companies, AMD's Bulldozer, IBM's POWER, Sun/Oracle's Sparc, etc.

These exploits are not a monolithic process, but many techniques chained together to get most of these exploits to work, and this common knowledgebase which has been refined by the researches will most certainly be recycled and reused on other processors.

If anything, the "capture 90% of the market." by Intel as you put it only ensures that these problems will usually be discovered on their processor's first.

If as much effort was put towards other cpus as it is towards Intel, you would see significant performance hits as well with them.


Expanding on your points, only AMD, SPARC and MIPS avoided Meltdown bugs, they were also found in ARM, and IBM POWER and mainframe/Z designs. All of these vendors have out-of-order speculative execution designs that are vulnerable to Spectre.

> If anything, the "capture 90% of the market." by Intel as you put it only ensures that these problems will usually be discovered on their processor's first.

Precisely. I haven't looked at the MDS set of flaws yet, but to take the first version of Foreshadow/L1TF, it targets the Intel SGX enclave, by definition the designs of other companies that don't have SGX won't be vulnerable. But as you note this doesn't mean they don't have their own design or implementation specific bugs.


Intel played fast-and-loose with correctness to eke out performance gains. They cut corners that they shouldn't have cut and now their customers are paying the price. Other companies cut similar corners, but Intel did it on a gratuitous scale.


Hard disagree. Correctness means conformance with the spec. The spec says, for example, you get a page fault if you access privileged memory. That’s what Intel does. The spec doesn’t make any guarantees beyond that.

The software stack is at fault for building their security model on assumptions about what the hardware did that aren’t guaranteed in the spec. The software assumes the existence of this magical isolation that the CPU never promised to provide.


Companies cut corners all the time, but then pay the price later through refunds or lawsuits or reduced sales or higher costs.

But Intel cut corners and offloaded most of the downside onto their customers and partners while pocketing the upside for themselves.


You have zero proof to that and your explanation also violates Hanlon's razor (Never attribute to malice that which is adequately explained by stupidity.)


I don't think it was malice, but I think it is perfectly fair to say Intel was playing fast and loose here, and they deserve all the flak they're getting as a consequence.

Vulnerabilities like Meltdown and the latest "MDS" vulnerabilities are absolutely a design decision.

Think about Meltdown, for example. The problem here was that data from the L1$ memory was forwarded to subsequent instructions during speculative execution even when privilege / permission checks failed. Now the way that an L1$ has to be designed, the TLB lookup happens before reading data from L1$ memory (because you need to know the physical address in order to determine whether there was a cache hit and if so, which line of L1$ to read), and the TLB lookup gives you all the permission bits required to do the permission check.

Now you have a choice about what to do when the permission check fails. Either you read the L1$ memory anyway, forward the fact that the permissions check failed separately and rely on later logic to trigger an exception when the instruction would retire.

This is clearly what Intel did.

Or you can treat this situation more like a cache miss, don't read from L1$ and don't execute subsequent instructions. This seems to be what AMD have done and while it may be slightly more costly (why?), it can't be that much more expensive because you have to track the exception status either way.

The point is, at some place in the design a conscious choice was made to allow instructions to continue executing even though a permissions check has failed.

The MDS flaws seem similar, though it's a bit harder to judge what exactly was going on there since it's even deeper in microarchitectural details.


> This is clearly what Intel did.

And ARM, and IBM, both the POWER and mainframe/Z teams.

Although this design decision of Intel's was in the early 1990s, shipping in their first Pentium Pro in 1995. When AMD's competition was a souped up 486 targeting Intel's superscaler Pentium, although I assume they were also working on their first out-of-order K5 at the time, it first shipping in 1996.


Nah, the K6 was not an AMD design, it was NexGen. (I had a Nx686 motherboard back in the day. I was young and wasted way too much money on exotic parts. I also had an Ark Logic video card about the same time, it might've been the Hercules Stingray Pro but my memory is a bit fuzzy after 25 years.)

https://en.wikipedia.org/wiki/AMD_K6

The AMD K6 is a superscalar P5 Pentium-class microprocessor, manufactured by AMD, which superseded the K5. The AMD K6 is based on the Nx686 microprocessor that NexGen was designing when it was acquired by AMD. Despite the name implying a design evolving from the K5, it is in fact a totally different design that was created by the NexGen team, including chief processor architect Greg Favor, and adapted after the AMD purchase.


That's interesting, and relevant in that the NexGen team was joined by DEC Alpha people to do the K7, but the K6 is not AMD's first out-of-order speculative execution x86 design, per Wikipedia the K5 was an internal project that "was based upon an internal highly parallel 29k RISC processor architecture with an x86 decoding front-end."


> And ARM, and IBM, both the POWER and mainframe/Z teams

Is the point of your argument that others did it too, so Intel shouldn't be accountable?

Or that others did it too, so it's reasonable to believe that the vulns were impossible to prevent?

This argument sounds a lot like the "deflect" part of Facebook's strategy to "delay, deny, deflect".


I'm saying Intel's design decisions are found in so many other vendors, Spectre in all of them I've looked for, that it represents an industry blind spot. Which calls for different kinds of "accountability", unless you're a rent seeker.


> Now the way that an L1$ has to be designed, the TLB lookup happens before reading data from L1$ memory (because you need to know the physical address in order to determine whether there was a cache hit and if so, which line of L1$ to read)

With VIPT caches (which AFAIK both AMD and Intel use for the L1 data cache), the TLB lookup and the L1 cache lookup happen in parallel. The cache lookup uses the bits of the memory address corresponding to the offset within the page (which are unchanged by the translation from virtual to physical addresses) to know which line of the L1 cache to read. Only later, after both the TLB and the L1 cache have returned their results, is the L1 tag compared with the corresponding bits of the physical address returned by the TLB.


This is only partially correct. A VIPT cache uses the offset into the page to know which set of the L1 cache to read. You still need the result of the TLB to know the line.

This is clear from the fact that a page is 4KB while the L1 data cache is 32KB, so the page offset cannot contain enough information to tell the processor which part of L1 to read.


Still need the tag to know which way to read and whether there is a hit, so the statement is pretty much accurate.


Hanlon’s razor is not an argument, it’s just a saying.


Ok here's an "argument" version: I argue that it is more likely that these problems were due to taking the wrong side of trade offs based on incomplete knowledge, weak foresight, and poor judgement. I argue that there is little or no evidence that executives or engineers at Intel actively didn't care about their reputation or their customers.

(Somewhat ironically, I think "Hanlon's razor is just a saying, not an argument" is just a saying, not an argument.)


Hanlon’s razor only applies to individual persons, never to corporations or groups.

Charity (which Hanlon’s razor is a part of) only applies to individuals.


> Hanlon’s razor only applies to individual persons, never to corporations or groups.

Why?

Do you think that the only reason for applying Hanlon's razor is some sort of moral principle (that only applies to individuals)?

I think the reason to apply Hanlon's razor is that stupidity is far more widespread than malice, and so that should be my prior. This is a purely epistemic argument, with no moral component.

I also think that larger groups always contain more stupidity than smaller groups, and I think it grows super-linearly. On the other hand, the effect of group size on malice (and on benevolence) is very complicated and unpredictable. Do you disagree with one of these beliefs?

If you agree that groups are almost always stupider than people but only sometimes more malicious, I think it's clear that I should be at least as eager to apply the razor to groups as to people.

In conclusion, an apposite quotation:

> Moloch! Nightmare of Moloch! Moloch the loveless!


Agreed. A big corporation working on products for a prolonged period of time has a different level of consciousness to their work than a single individual.

Take the 737 Max as an example: It's absolutely malice, because the level of incompetency you'd have to have to let a plane into the air that can tilt all the way down by means of a single(!) faulty sensor would disqualify anyone from ever building a plane.

An aviation problem such as that is still comparatively easy for a layman to understand. When it comes to CPU-microarchitectures, I'm not so sure. But I trust they have professionals designing their chips, and my default is to assume they are competent and that malice has occured, and rather they'd have to prove the opposite.


> Pretty cool business model where putting dangerous vulns in your product can help you capture 90% of the market.

This is extremely disingenuous phrasing. You seem to be trying to craft a narrative that makes putting security vulnerabilities in products an intentional thing to sabotage people, time, and resources.


Of course it's not "an intentional thing to sabotage people, time, and resources".

It's intentional business decision to maximize performance and minimize costs paid by Intel, making their products faster and generating higher margins.

They achieved this by gambling that vulns wouldn't be exploitable. But Intel externalized that risk onto their customers and partners without their knowledge or consent.

So when the gamble failed, the rest of the ecosystem bears the cost while Intel harvests the gains for themselves.

Externalizing risks and losses on others without their knowledge or consent is a form of theft.


> They achieved this by gambling that vulns wouldn't be exploitable.

That's a huge leap, and seemingly without merit. You're still just arguing that Intel knew this would create huge vulns and they went ahead regardless, which is baseless.


> You're still just arguing that Intel knew this would create huge vulns and they went ahead regardless, which is baseless.

The first scientific paper about such vulnerabilities namely in Intel processors is from 1995 [1]. They went ahead regardless. Processor technology has developed a lot. Many things that were not practically feasible in 1995 are possible today. That holds for the good and the bad. There must be some people inside Intel who must have understood that. Otherwise all that progress would not have possible. Some experts might have blind spots, but I doubt Intel microarchitecture is developed by just a handful of people.

[1] https://en.m.wikipedia.org/wiki/Meltdown_(security_vulnerabi...


The product being designed in 1995 was very different from today. Many of these trade offs result in much lower risk vulnerabilities in 1995 when you don’t have virtualization or hardware shared with anyone with a credit card.

I lack the technical expertise to honestly critique the architecture. My assumption is that the folks behind the design of some of the more advanced technology on the planet aren’t morons or seeking to defraud the market.

All of the noise here is about significant vulnerabilities that haven’t been exploited in public. It is a serious defect, but not the end of the world.


That paper primarily discusses a (cooperative) covert channel between processes using the FPU TS register, then mentions that caches or the TLB could similarly be used to implement a covert channel. IMO, meltdown and Spectre and MDS are a different class of vulnerability, so I don't know that I would say that paper was about "such vulnerabilities."


That linked study covers speculative execution and hyperthreading? It seems to be covering 386/486 era processors


Right, I think it makes a lot of sense that the engineers designing the cores didn't think about the security implications. I know I wouldn't have.


At some point, I bet someone in a design meeting said "Hey, I think this might compromise security, can we study that before implementing it?"

And the decision to implement it without that study, or despite it, was made. Because implementing it increased performance which made the numbers which captured market share.

It's not like the processor designed itself that way. Humans made choices along the way, and every choice is a tradeoff. Execution-time attacks and other sidechannel leakage have been well-known for years, and I can't imagine that at a place like Intel, nobody had heard of that.


> Because implementing it increased performance which made the numbers which captured market share.

More than the following big things, in both directions?

- A consistent fabrication advantage until very recently? That's said to have wiped out a generation of clever hardware architects, when Intel beat their best efforts with their next process node.

- Falling behind AMD with the Netburst "marchitecture", the front side bus memory bottleneck, and only supporting Itanium as a 64 bit architecture?

- Reversing the above by licencing AMD64, reverting to the same Pentium Pro style design AMD was using, and copying their ccNUMA multi-chip layout (each is directly attached to memory, with fast connections between them).

- AMD losing the plot after the K8 microarchitecture, including putting a huge amount of capital into buying ATI instead of pushing their CPU advantage? And then having to sell off their fabs, putting them at a permanent disadvantage until Intel's "10nm" failed? (And what happened to the K9??)

- Anticompetitive marketing and sales?

> Execution-time attacks and other sidechannel leakage have been well-known for years, and I can't imagine that at a place like Intel, nobody had heard of that.

The strange thing is that security researchers assumed for years that Intel was accounting for this, when it turned out not a single one of AMD, ARM, IBM POWER and mainframe/Z, Intel, MIPS, or SPARC did?


"At some point, I bet someone in a design meeting said "Hey, I think this might compromise security, can we study that before implementing it?" And the decision to implement it without that study, or despite it, was made."

Because we should assume all your past missteps were due to willful negligence? Yeah, let's not just start making shit up to support the group think outrage.


agree, I really doubt people were like "let's compromise security for performance". More likely it was something like: "hey here's an idea for how we can get more performance" "convince me it's correct" "oh, that's cause X, Y, Z..."

and either no one thought about the security implications, or no one could come up with a security problem at design time - likely just no one thought about side channel problems.


Why do you think that it's likely that someone in Intel design meetings had thought of that if literally noone outside Intel did for many years?

It obviously wasn't obvious to the very, very many people worldwide who know about execution-time attacks and other sidechannel leakage, as it took more than 10 years of almost every chip worldwide being vulnerable until it was noticed. It's not as if some 2016 design tradeoff suddenly enabled Meltdown. Engineers were carefully studying competitor's chip designs, and none of them noticed that flaw for more than a decade.


Even better: at least one person documented thinking about this issue, trying to exploit it and failing - showing that even if you could come up with this, it was really non trivial to achieve.


This is a really weak argument because plenty of hilariously stupid and easy to exploit bugs are found in extremely widely used software all the time.


And how about the generations of computer architecture professors that have promulgated the lie that is "architecturally visible state," am I right? Speculative execution is fine, cause we just throw away the results before they become architecturally visible.

Those professors aren't doing too bad now either, with plenty of papers to be published outlining the flaws and proposing solutions.

/s

Really, though, humanity is still pretty new to this whole computer architecture thing in general, and running untrusted code securely in particular. Im skeptical that someone at Intel was aware for the possibility of vulnerabilities for 10+ years before any of them were made public.

To be honest, I'm a little perplexed by the animosity directed towards Intel after all these vulnerabilities. Vulnerabilities in software are most frequently the result of ignorance, but everyone seems to assume the engineers and managers at Intel knew better (or should have know better). Who would have told them?


It was known since the Pentium 4 era that hyper-threading had serious timing attack concerns, concerns Intel seems to have completely ignored.

http://www.daemonology.net/hyperthreading-considered-harmful...


The key sentences from that paper: "If applications and libraries are written in such a manner that the code path and sequence of memory accesses are oblivious to the data and key being used, then all timing side channels are immediately and trivially closed. [We assume that the hardware used does not exhibit data-dependent instruction timings; without this assumption, it is impossible to offer any such guarantees.]"

That is, these earlier timing side channels can be avoided, even in the presence of SMT/HyperThreading, by not doing data-dependent branches or memory accesses, and not using instructions with data-dependent timings. For instance, instead of "x = cond ? a() : b();", do something like "mask = cond - 1; x = a() & ~mask | b() & mask;".

That is not the case with MDS: even if you carefully avoid data-dependent branches, memory accesses, and instruction timings, you are still vulnerable. It's really a new vulnerability class.


Linux ignored those concerns as well. Torvalds didn't think hyperthreading was at fault, and Kleen stated that it was a "user space problem." In other words, there wasn't exactly consensus that there was something to fix.

Anyway, that's not the same vulnerability as any of those that have become public in the last 18 months.


Its very much in the same class of vulnerabilities, arguably its completely impossible to secure SMT, because it leaks information about what other threads are doing by the very virtue of how it works (using CPU resources the other thread isn't utilizing in a very fine grained clock-for-clock way)


> because it leaks information about what other threads are doing by the very virtue of how it works (using CPU resources the other thread isn't utilizing in a very fine grained clock-for-clock way)

The issue with MDS is not which CPU resources each thread is using. It's something much more subtle. The issue with MDS is data left over by each thread within internal temporary buffers, which shouldn't be an issue since these internal temporary buffers can't be read by the other thread before being overwritten. However, in the phantasmagoric world of Spectre, these residual values can leave an impression which can be picked up through careful manipulation.

Notably, MDS can also be exploited by the same thread. This is what the microcode updates are about: they hijack a useless legacy instruction from the 90s, adding to it the extra behavior of clearing all these internal temporary buffers. Operating systems are supposed to call this instruction whenever crossing a privilege boundary. The problem with SMT is that it's really hard to make sure both threads are on the same side of all privilege boundaries all the time.

Also note that, as far as we know, all AMD processors, even the ones with SMT, seem to be immune to MDS. This shows that this vulnerability is not a fundamental property of SMT, unlike other side channels which depend on the resource usage by each thread.


That's all in retrospect - Intel hasn't even really dropped their prices or increased their core counts from AMD competition or security disasters.


Why would they? They keep selling all the chips they can make at their current prices.


“Put more bugs in the software! I’m making a fortune out here!”

https://dilbert.com/strip/2000-03-19


That would only be valid as a short-term strategy.


And for Intel these bugs go back to the first Pentium Pro in 1995, their first out-of-order speculative execution design. Very cunning of them, along with IBM two years earlier, the SPARC bunch starting that year, MIPS and AMD a year later, and recently ARM as it targeted higher performance to follow the same exactly "strategy".

Or maybe it was an industry blind spot, now a very big deal because so many of these CPUs run other people's random code, JavaScript in browsers, and anything and everything in shared cloud hosts.



I don't know about Fujitsu but Sun's SPARC designs were free from speculative execution until Oracle bought them.


Or did no one bother to check the older designs??


Oops, didn't read your comment closely enough, strike my above comment.

Followed the Wikipedia link to the 1995 Fujitsu/HAL out-of-order design (https://en.wikipedia.org/wiki/HAL_SPARC64) and it says it's superscalar, but then that the first version can execute as many as 4 instructions at once and out-of-order. And from the very first version had branch prediction, which equals speculative execution as far as I know.


Who are these customers?

In my case, we observed very marginal performance impacts from last year’s mitigations.


At least as upstanding as sending your shills to bad mouth American products.


Gamers dont care about obscure vulnerabilities on their gaming rigs.

So I think this is some sort of misguided hit piece against intel.

Everyone knows pcs are riddled with security flaws less obscure than this. People who run their business on cloud servers might care. Gamers though? No.


They should care, considering some of these vulnerabilities are exploitable just by JavaScript, meaning merely visiting a bad website could leak sensitive information



Lol. You are out of your league here fearmonger.

Gamers are are the 5 dollar whores of the security world, they are riddled with bugs and the dont give a crap.


We've banned this account for repeatedly violating the site guidelines. If you don't want to be banned, you're welcome to email hn@ycombinator.com and give us reason to believe that you'll follow the rules in the future.

https://news.ycombinator.com/newsguidelines.html


I have no data, but it would have to seem to me that gamers are a tiny market compared to what Intel sells in any given year.


This however does remove the benefits of ASLR.

The only two mitigations is either disabling hyper-threading or disabling JIT.


I bought a non K CPU (without hyper threading) , If I would have bought the K version and later Intel announces "disable the hyperthreads to be 100% safe" I would not be happy, even if I am 99.99% safe with HT on, (I personally did not enable or disable any security thing, I run the Ubuntu defaults).

IMO Intel customers should not be indifferent, what you thought you bought was not what you got latter, you lost performance and you have to do advanced stuff to disable the security patches and be unsafe

Edit: My bad, I considering buying i7 and got an i5 , I got confused(my i5 is a K CPU 6th gen).


Which processor did you buy where the K version has hyperthreading and the non-K does not? I don't think that distinction exists in any Intel cpu generation. K just means unlocked and sometimes a higher clock, it says nothing about hyperthreading.


The K thing is about overclockability, not HT.


Yeah. As the article says, losing hyperthreading effectively downgrades your Core i7 to a Core i5, since (depending on generation) that's one of the main differences between the two processor series.


As of the 9th generation, Core i7 doesn't have hyperthreading.


Is it possible they're talking about a chip where the K version had hyperthreading but the lower end of that line didn't?

Haven't kept up on Intel well enough for the last two years or so to be sure, but I seem to remember there being a generation where that happened.


My bad, I considering buying i7 and got an i5 , I got confused.


Some games have internal currency or items that can be sold. I'm pretty sure there's malware targetting those items.


"Gamers" don't game 100% of the time. They also do normal web browsing (possibly to buy the next gaming pc).

Your argument is invalid.


Intel doesn't make its profits with gamers. It's data centers that buy Intel chips by the boatload.


Gamers are not the only target market. Gamers are a fairly small market.


Intel are consistently the assholes of the silicon market.


Intel is not really screwed, cloud providers are. Those "vCPUs" people have been buying are actually hyperthreads. I have a hypothesis that a double digit percentage of cloud customers don't know what vCPUs are. As a cloud provide, imagine cutting your fake "core capacity" in half _and_ having to raise prices to offset the increased capex.


I've had difficulty parsing the guidance from Google for Google Kubernetes Engine here: https://cloud.google.com/kubernetes-engine/docs/security-bul...

> Note that n1-standard-1 (the GKE default), g1-small and f1-micro VMs only expose 1 vCPU to the guest environment so there is no need to disable Hyper-Threading.

I'm wondering if they've decided to just eat the loss on the single vCPU nodes, and for vCPUs >= 2, they pass the decision on to the customer.

Or is there someone else's VM running on the other hyperthread?

Here's their guidance on machine types: https://cloud.google.com/compute/docs/machine-types

> For the n1 series of machine types, a vCPU is implemented as a single hardware hyper-thread on one of the available CPU Platforms.


This is why the last version of AWS ARM had their vCPUs being 30% faster than regular vCPUs (no HT in ARM)


Wow. Even more Intel CPU vulnerabilities. I hadn’t even heard about MDS until now. Glad I based my new rig on the Ryzen stack.

I wonder how much these continuous performance degradations are costing bigger customers, like cloud operators. This shit can’t be cheap.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: