Hacker News new | past | comments | ask | show | jobs | submit login
[dupe] Intel deprecates SGX on Core series processors (intel.com)
130 points by thinkmassive on April 16, 2022 | hide | past | favorite | 132 comments



related 400 comment thread from 3 months ago:

https://news.ycombinator.com/item?id=29932630


Many moons ago, I wrote a prototype on Azure's 'confidential computing' VMs that demonstrated how a computer could 'prove' to another that it was running certain code using 'attestations.' I thought it was cool as hell, and honestly thought that if such a technology held up you could build many interesting things with it.

Potentially, it would have had many use-cases in the DeFi industry, privacy-preserving computations, and so on. Looking a bit deeper though there were signs that placing all your trust in this technology was still a gamble. For example, the 'attestations' I spoke of earlier... were reports that had been signed by Intel's public key. You lose or hack that key and suddenly you can spoof reports. Then there were the hardware exploits which was the nail in the coffin.

It's sad really because Intel SGX was some innovative shit. It would have enabled applications that just aren't possible with a conventional computing model. But it was too good to be true, unfortunately.


It is gone now, but IIRC Microsoft was working on a blockchain for inter-enterprise settlement that ran in the secure enclave. It had extremely high throughput and all of the parties could “trust” that the code running was the same, so they were all following the same rules. Neat concept, they killed it before ever sharing source code.


I have seen a deployed altcoin built on SGX. Can't remember the name or if it was even successful.

In my opinion, it's a huge risk. With a regular blockchain, the fact that someone might maliciously be running the wrong code is baked into the design, and the security model has to handle it.

With SGX, you are relying on Intel's security model to make sure everyone is running the same code. If SGX is ever broken (which it has been, to some extent) and a malicious node is running different code, then your whole security model falls apart and they can start attacking the blockchain.

You might argue that you can design the blockchain to be secure even when SGX is broken. But if it's secure enough without SGX, why use SGX? You risk your security model being broken due to a bug and nobody realizing due to SGX protecting it, until SGX stops protecting.

Plus, you exclude people who don't have SGX from running your blockchain.


> I have seen a deployed altcoin built on SGX.

That may be MobileCoin[0].

Before that, from the same inspiration, Signal built SVR[1], a recovery system for the address book that uses SGX to ensure the servers cannot decrypt the backup.

I wonder how they will react to this deprecation.

[0]: https://developers.mobilecoin.com/overview/security/secure-e...

[1]: https://signal.org/blog/secure-value-recovery/#deus-sgx-mach...


Yeah mobilecoin makes heavy use of SGX. For what it’s worth Intel isn’t going to deprecate this server side for modern use cases anytime soon.

The trade offs between zk and hardware accelerated encryption still lean towards hardware though I’m not sure how much longer that will be true. It’s very difficult to imagine a general purpose ZK machine, but you don’t need general purpose compute to get most of the value out of defi as it exists today.

In short, we’re 5-10 years away from the first general compute zk machines/vm’s imho but bespoke zk circuits are starting to appear for many use cases. Checkout plonky2 as a cool example of a modern fast zk circuit, the polygon team is doing great work here.

We’ve been working towards a fully zk winterfell implementation/proposal but it’s not there yet.

In closing, one of the unique things in mobilecoin is private information recovery at scale using MobileCoin fog (https://mobilecoin.com/news/fog-foward-in-oblivious-computin...) which isn’t possible with any zk circuit I’ve ever seen (due to the need to store lots of information durably with fast access that doesn’t leak access patterns to the server).


That is a good strategy. I hope the ZK VMs come to a reasonable performance before SGX goes fully away. Even though Intel won’t kill it for Xeon soon, I wouldn’t be surprised if they did in 5 years.

I wonder whether we can have our cake and eat it too; are there plans for FPGA-accelerated ZK machines?


It wasn’t designed for public blockchains, it was designed for uses in mid-trust environments (“enterprise blockchain”), trading off some level of trust for performance.

Original info: https://azure.microsoft.com/en-us/blog/announcing-microsoft-...

Some of the current outcomes:

https://azure.microsoft.com/en-us/services/azure-confidentia...

https://confidentialcomputing.io/

https://github.com/veracruz-project


Are you thinking of Secret Network? If I remember correctly, they use SGX for transaction privacy (transactions are encrypted with a key that is only known to enclaves). An SGX break I think would just make it a generic proof of stake network but maybe the way they ended up doing smart contracts is different?


Yeah, I think secret network is the one I'm remembering. It's been a long time since I was paying attention to the cryptocurrency space.

My consern is that it hasn't been tested as a generic proof of stake network and there is a risk that it might not degrade as expected.

It also means the blockchain can't be independently verified. If someone was to exploit it in a way that gave themselves extra coins, then nobody would ever know because all evidence is hidden inside the secure enclave.

How big is the risk? I'm not sure. I just suspect it's bigger than the advantage you get from using SGX; There are other ways to get privacy.


It seems like this argument proves too much, because it rules out defense in depth. Why have multiple layers of defense when one perfect layer does the job?

I think the answer is that you're never really sure that your defense is perfect, and that's true of both SGX and byzantine algorithms.


Are you thinking of https://en.m.wikipedia.org/wiki/Confidential_Consortium_Fram...?

It is alive and the source code is shared.


I am! The “Coco” github from back in the day was killed so I thought it was gone.


> all of the parties could “trust” that the code running was the same

Doesn't that trust fall apart when the SGX has known hacks?

eg bad actors can do their thing


Yeah, the Foreshadow attack leaked signing keys, so you could attest to a modified program running outside an enclave. I think other attacks didn't get that far though


Yes, but the environment for it was already more “trusted” than public, it was meant for enterprise blockchain.


How did that solution do? How did it solve the 51% attack problem without being bundled with a cryptocurrency?

...Or are you just using the word "blockchain" to literally mean a Merkle tree, so that the "innovation" was equivalent to storing settlements in git?



We can just use phones or USB sticks as signing keys, for anything and everything — why does nobody talk about/want/demand this?!


This is used heavily by hardware tokens like Yubikeys, and especially for cryptocurrency people with so called "hardware wallets" like Trezor and Ledger (which can generate many subkeys on the devices).

Companies like Google now have employees issues with Yubikeys [1].

[1] https://www.yubico.com/resources/reference-customers/google/


We do it all the time with our phones with Apple Pay/Google Pay.


They claim SGX support on Xeon is still “full steam ahead”

https://community.intel.com/t5/Blogs/Products-and-Solutions/...


Specifically the next generation of SGX is pushing ahead, since a number of flaws were identified in the original design.


I'm taking bets on how long before the next generation is completely broken as well. It is only a matter of time, and the timeline is likely short.


The onus is on Intel to provide a solution that is sound. That said, there's no reason to assume that what they're doing is impossible, any more so than assuming that cryptography in general is impossible.


Only having it on expensive chips will probably help as much as anything.


Curious how there is no CSAM scare mongering when it's companies that need private storage of information.


Societally, when some person or org uses a Xeon processor for CSAM distribution, nobody things "man I wish Intel stopped this". However, when it's Apple considering end-to-end encrypted iCloud Photos or iCloud backups, people (read: federal law enforcement) will blame Apple for allowing CSAM to proliferate locked away from law enforcement.

edit: as for why this is bad, federal law enforcement have a lot more pull when seeking change in laws in congress. Maybe Apple would've had to implement a government backdoor into iPhone itself if they didn't pull the CSAM scanning stunt (and I imagine they knew the backlash was inevitable, which allowed them to use that backlash as reason to push back on either scanning or a backdoor).


>However, when it's Apple considering end-to-end encrypted iCloud Photos or iCloud backups

Apple never used the excuse they want to do end to end encryption, did this changed recently? because as far as I know only fanboys used this excuse pulled out of the air. This fanboys always have the most shit excuses like "if Apple let' s you do X what if some child does X while walking and falls in a hole, then Apple will look bad", fanboys really hurt every time their favorite company looks bad in the news see how fast some bad news gets down-voted here on HN(Tesla, MS news too from what I noticed, you start to add a comment and in 1 minute the article is gone from the first page or even flagged)


The original article is here[0], but it seems this article might not be holding its weight[1], so perhaps there never was a plan to encrypt iCloud backups.

0: https://www.reuters.com/article/us-apple-fbi-icloud-exclusiv...

1: https://blog.elcomsoft.com/2021/01/apple-fbi-and-iphone-back...


Yes, the End to End Encryption part is just fanboys speculation nothing official from Apple, even after they had a lot of bad PR they did not even slightly hint that they are doing it for E2E.

Google and FB reported a lot more CSAM then Apple so far, so what do Apple fans think

1 Apple users have less CSAM then Google/FB users

2 Apple users use Google for CSAM, maybe those services are better qwuality or for what they use it for

3 Apple never cared about CSAM and did the minimum work possible, and they never care for CSAM now either but for some reason they needed this backdoor scan and report process to always run on devices.

4 some other reason, maybe Apple users are not into this shit, keeping in mind Apple is anti adult software on their devices , some people would avoid their censored devices


Is the ME next? One can dream... that Intel will stop playing with all of this user-hostile "security" nonsense and focus on performance and power efficiency while also releasing all the detailed programming/design information publicly.


They aren't going to.

The industry has been pushing for copyright enforcement built into the hardware and OS. Windows 10/11 is the first client-server OS, aka the ultimate security risk since you all have been stealing software from yourselves with the rise of mmo's and steam, there's ZERO reason for any piece of software to require an internet connection.

Intel, MS, Sony and AMD are not going to give up on trusted computing. They want to remove control of our PC's to jack up software prices and force the public to pay for software and games.

https://www.cl.cam.ac.uk/~rja14/tcpa-faq.html


Looks like they deprecated the link since I first loaded the page. New link, same content: https://edc.intel.com/content/www/us/en/design/ipla/software...


Also the note at the bottom of that page: that they are doing away with AVX-512. A bit sad for those who need to squeeze maximum performance out of a CPU. (I gather it lives on in Xeon and Zen 4.)


AVX-512 is not supported on the CPUs with "Hybrid Technology", i.e. the current Alder Lake and Raptor Lake, which will be launched towards the end of 2022, to replace Alder Lake.

What will happen when the Gracemont small cores will be replaced at the end of 2023, is not known yet.

It is still possible that the successor of Gracemont will have AVX-512, in which case the Intel processors with "Hybrid Technology" will also have it, at that time.

Taking into account that Zen 4 is expected to have AVX-512, it is unlikely that Intel has not also planned for Meteor Lake (2023) to have it. Meteor Lake is supposed to be made using a more dense CMOS process than Alder Lake and Raptor Lake, which should enable Intel to implement AVX-512 in the small cores.


It seems silly to do that just due to asymmetric processors. The small cores could simply not support the instructions and the OS could then handle the fault by moving the task to a big core. Perhaps this could be signaled via a different CPU feature flag to avoid libraries using AVX512 instructions sporadically (e.g. in memcopy) and only use it in long-running loops. Or maybe give the OS a way to determine whether the CPU flags should be shown to a specific process or not. Or applications could install a SIGILL handler and deal with it in userspace.


even easier would be to have avx 512 be micro coded for the small cores. most of avx512 can be emulated pretty easily.


This. And performance wouldn't be worse than using AVX twice...

But worst case scenario of the OS moving the process to a big core on an illegal instruction or scheduling it to the right core based on a required capabilities system would also be quite acceptable most of the time.

Plus, it'd be great for supporting more specialized cores designed for different purposes and running a single core ISA with extensions for their specific needs. IIRC, there are some ARM chips that have three different kinds of core.


Meh, the few workloads that benefit from AVX-512 would be better off on a GPU.

Also note that it wasn’t well received 2 years ago. Example: https://news.ycombinator.com/item?id=23809335


That's a bold claim. What are your sources? The two latest things I worked on, JPEG XL and quicksort, see a 1.4 and 1.6x speedup from AVX-512. That's on SKX and includes the much-maligned throttling. On a system level, I doubt moving those to GPU is helpful.


Also, moving things to the GPU may be good for throughput but bad for latency depending on the workload, since offloading to GPU has a cost and data exchanges too.


> offloading to GPU has a cost and data exchanges too.

This is bad with dGPUs over the PCIe bus, but not so much with GPUs that share a very fast memory bus with the CPU. In this case, the layout of the data may prove challenging to keep the same for when you use a CPU and a GPU.


Agree. Real-time signal processing with feedback loops is a case in point.


If you're processing a lot of data, you're better off moving it to the GPU. If you're processing only a little data, the speed-up doesn't matter. I wonder how wide is the Goldilocks Zone where AVX512 makes a practical difference?


Golly, another sweeping statement :) It seems to me a GPU might actually sort more slowly.

For 64-bit keys, we sort about 1 GB/s per (5 year old) Skylake core, and perhaps 5-6 parallel.

This (2018) reports 3.5 GB/s: https://benkarsin.files.wordpress.com/2018/10/dissertation.p... And a 6-year old GPU radix sort reports 2.1 GB/s: https://github.com/Bulat-Ziganshin/Compression-Research/tree...

BTW I've worked on a product that used GPUs. That typically requires everything to move to the GPU, which is not always desirable or feasible.


This (2020) reports "Despite the fact that we send the entire data array to the video card and back, sorting on GPU of 800 MB of data is performed about 25-fold faster than on the processor."

https://dev.to/tishden/computing-with-gpu-why-when-how-and-s...

This shows a approximately 20x speedup (2021, graph 1 vs 3): https://www.irjet.net/archives/V8/i7/IRJET-V8I7714.pdf


Thanks for the example! Sounds like 1.6 GB/s on an entire Tesla K80 (300W TDP). This is in fact several times slower than our results on Skylake (with half the TDP), but note that K80 is from 2014.

The "25-fold speedup", as is often the case for such reports, comes from not optimizing the CPU side.


Assuming AVX-512 actually works well (haven't had the opportunity to use it myself), it could be very useful in high-end gamedev. Data-oriented programming is popular there, which makes it much easier to optimize data transformations w/ SIMD. Good compilers can even do some auto-vectorization (it's a nice boost with no additional programmer effort but you can't rely on it much). GPUs tend to be already fully loaded by the intensive rendering workload, so there is a large incentive to squeeze as much as you can out of the CPU.


gpus generally aren't worth it for moderately cheap O(n) operations. avx 512 is really nice because if you match the memory speed, you beat GPU (since the cost of GPU will also be memory speed constrained).


'few workloads that benefit from AVX-512 would be better off on a GPU.'

Have you seen the state of GPU software development? GPUs are very expensive in cloud, are poorly supported in containers and virtual machines If you want to use GPU compute, some stuff is Nvidia-only, some stuff is glitching and crashy, it's probably not avaliable in your language of choice, etc.

It is literally impossible for me to add GPU compute to any of our corporate workloads, but I can tap into AVX easilly in my language of choice.


Pretty much all cloud CPUs are planned to continue supporting AVX-512. It's only the client side where they are axing it.


One would think that, but the reality is that there are some workloads which seem to fare better on AVX-512 currently. Usually where you can't get the benefit of the massive GPU parallelism/thread rotation into play, and you need low-latency between operations for a large-enough data set, while its still small enough for benefit from the cache on the CPU.

I'm pretty sure GPUs closes that gap over time though.


Doesn't x86-64v4 require it?

This throws upside down the whole x86-64 levels concept. What a mess.


According to this article [0], x86_64 level 4 is AVX-512 itself. So no, you lose a level completely. Nothing gets borked, and you get level 3 hardware.

[0]: https://www.phoronix.com/scan.php?page=news_item&px=GCC-11-x...


This was announced some time ago with the first 12th generation processors. Weirdly the hardware is present in those chips and it has since been enabled by some motherboard vendors. https://www.tomshardware.com/uk/news/msi-reenables-avx512-su...


That's now been "fixed" by blowing a permanent fuse.

https://www.tomshardware.com/news/intel-nukes-alder-lake-avx...


Good riddance. (For those unaware, basically the only use-case of SGX was hardware-enforced DRM.)


On consumer CPUs DRM may have been the only use case, and indeed good riddance. But on the server side it's possible for tenants to use these hardware features to run workloads that the hosts verifiably cannot modify or inspect. In other words, this lets you use AWS/GCP/Azure etc. while keeping both your data and your code competely opaque to Amazon/Google/Microsoft etc. Disclaimer: my job is to write the software that makes this possible, and all our code is open-source and housed under the Linux Foundation's Confidental Computing Consortium: https://enarx.dev/ .


Of course, that requires tenants trust Intel's security.

As a security researcher and given past showings from Intel, I wouldn't put much faith in SGX, even if they try to fix past flaws. SGX as a concept for tenant-provider isolation requires strong local attacker security, which is something off the shelf x86 has never had (not up to contemporary standards, ever) and certainly not in anything Intel has put out. They've demonstrated they don't have the culture nor security chops to actually engineer a system that could be trusted, IMO. Plus then there's all the microarchitectural leak vectors with a shared-CPU approach like that, and we know Intel have utterly failed there (not just Spectre; there was absolutely no excuse for L1TF and some of the others, and those really showed us just how security-oblivious Intel's design teams are).

Right now, the x86 world would probably do well to listen to Microsoft, since their Xbox division managed to coax AMD into actually putting out secure silicon (they're one of the two big companies doing proper silicon security at the consumer level, the other being Apple and Google trying to catch up as a distant third). But given the muted response to Pluton from the industry, and the poor way in which this is all being marketed and explained, I'm not sure I have much hope right now...


> Of course, that requires tenants trust Intel's security.

I generally agree with you. But I recently realized there might be one usecase, and it's pretty much what signal is doing. They're processing address books in SGX so that they can't see them. I don't have much faith in the system because I don't trust SGX, of course.

But there is one interesting aspect to this. If anyone comes knocking and tells them to start logging all address books and hand them over, they can say that it's not possible for them to do so.

Anyone wanting to do that covertly would at least need to bring their own SGX exploits, meaning it probably offers SOME level of protection. Certainly not if the NSA wants the data or some LEA is chasing something high-profile enough that they're willing to buy exploits and get a court order allowing them to use them. But it does allow them to respond with "we don't have this kind of data".


Secure enclave as legal defense is an interesting angle, thanks for sharing.

It's become a moral cause to make a lot of big-data computing deniable, to be data-oblivious. This is a responsible way to build an application, is well-built security, and I like it a lot.


I think “responsible” is a bit too strong a word, when most of these computations could just run on the client.

I agree this work is important and enclaves are better than nothing though.


I want to spew curse words, because, from what I have been able to comprehend, all the web crypto systems contravene what you & I seem to agree is a moral, logical goal. All are design to give the host site access to security keys, & to insure the user-agent/browser has the fewest rights possible. We have secure cryptography, but only as long as it's out of the user-agent/client's control.

We've literally built a new web crypto platform where we favor 100% the vile fucking cloud fuckers for all computation rather than the client, which seems as fucked up horseshit backwards trash city dystopia as could be possible. Everything is backwards & terrible.

That said, we 100% cannot trust most user-agent sessions, which are infected with vast vast spyware systems. The web is so toxic about data sharing that we have to assume the client is the most toxic agent, & make just the host/server responsible. This is just epically fucking fucked up wrong, & pushes us completely backwards from what a respectable security paradigm should be.


Hi chiming in to double down on this, as the downvotes ongoingly slowly slowly creep downward even still.

In most places, end-to-end security is the goal. But we've literally built the web crypto model to ensure the end user reaps no end-to-end benefit from web cryptography.

The alternative would be to trust the user-agent, to allow end-to-end security. But we don't allow this. We primarily use crypto to uniquely distinctly identify users, as an alternative to passwords.

This is a busted jank ass sorry sad limited piece of shit way for the web to allow cryptography in the platform. This is rank.

The Nitrokey security key people saw this huge gap, & created a prototype/draft set of technologies to enable end-to-end web encryption & secure storage with their security keys. https://github.com/Nitrokey/nitrokey-webcrypt


This paper convinced me it will be at least a decade before SGX or similar have any semblance of security:

https://www.usenix.org/system/files/conference/usenixsecurit...

The basic idea is that you can play with the clockspeed and voltage of one ARM core using code running on the other. They used this to make an AES block glitch at the right time. The cool part is that, even though the key is baked into the processor, and there are no data lines to read the key (other than the AES logic), this lets them infer the key.

Hmm. The paper is 5 years old. I still think we are a decade away.


That's one reason why most TrustZone implementations are broken: usually the OS has control over all this clocking stuff. It's also one way the Tegra X1 (Switch SoC)'s last remaining secrets were recently extracted.

It's also how I pulled the keys out of the Wii U main CPU (reset glitch performed from the ARM core). Heh, that was almost a decade ago now.

That's why Apple uses a dedicated SEP instead of trying to play games with trust boundaries in the main CPU. That way, they can engineer it with healthy operating margins and include environmental monitors so that if you try to mess with the power rails or clock, it locks itself out. I believe Microsoft is doing similar stuff with Xbox silicon.

Of course, all that breaks down once you're trying to secure the main CPU a la SGX. At that point the best you can do is move all this power stuff into the trust domain of the CPU manufacturer. Apple have largely done this with the M1s too; I've yet to find a way to put the main cores out of their operating envelope, though I don't think it's quite up to security standards there yet (but Apple aren't really selling something like SGX either).


You trust the security of your CPU vendor in all cases. SGX doesn't change that. If Intel wanted to, they could release a microcode update that detects a particular code sequence running and then patches it on the fly to create a back door. You'd never even know.

"SGX as a concept for tenant-provider isolation requires strong local attacker security, which is something off the shelf x86 has never had"

Off the shelf CPUs have never had anything like SGX, period. All other attempts like games consoles rely heavily on establishing a single vendor ecosystem in which all code is signed and the hardware cannot be modified at all. Even then it often took many rounds of break/fix to keep it secure and the vendors often failed (e.g. PS3).

So you're incorrect that Intel is worse than other vendors here. When considering the problem SGX is designed to solve:

- AMD's equivalents have repeatedly suffered class breaks that required replacing the physical CPU almost immediately, due to simple memory management bugs in firmware. SGX has never had anything even close to this.

- ARM never even tried.

SGX was designed to be re-sealable, as all security systems must, and that more or less has worked. It's been repeatedly patched in the field, despite coming out before micro-architectural side channel/Spectre attacks were even known about at all. That makes it the best effort yet, by far. I haven't worked with it for a year or so but by the time I stopped the state of the art attacks from the research community were filled with severe caveats (often not really admitted to in the papers, sigh), were often unreliable and were getting patched with microcode updates quite quickly. The other vendors weren't even in the race at all.

"there was absolutely no excuse for L1TF and some of the others, and those really showed us just how security-oblivious Intel's design teams are"

No excuse? And yet all CPU vendors were subject to speculation attacks of various kinds. I lost track of how many specex papers I read that said "demonstrating this attack on AMD is left for future work" i.e. they couldn't be bothered trying to attack second-tier vendors and often ARM wasn't even mentioned.

I've seen some some security researchers who unfortunately seemed to believe that absence of evidence = evidence of absence and argued what you're arguing above: that Intel was uniquely bad at this stuff. When studied carefully these claims don't hold water.

Frankly I think the self-proclaimed security community is shooting us all in the foot here. What Intel is learning from this stuff is that the security world:

a. Lacks imagination. The tech is general purpose but instead of coming up with interesting use cases (of which there are many), too many people just say "but it could be used for DRM so it must die".

b. Demands perfection from day one, including against attack classes that don't exist yet. This is unreasonable and no real world security technology meets this standard, but if even trying generates a constant stream of aggressive PR hits by researchers who are often over-egging what their attacks can do, then why even bother? Especially if your competitors aren't trying, this industry attitude can create a perverse incentive to not even attempt to improve security.

"the x86 world would probably do well to listen to Microsoft, since their Xbox division managed to coax AMD into actually putting out secure silicon"

SGX is hard because it's trying to preserve the open nature of the platform. Given how badly AMD fared with SEV, it's clear that they are not actually better at this. Securing a games console i.e. a totally closed world is a different problem with different strategies.


"SGX is hard because it's trying to preserve the open nature of the platform"

Except that was an afterthought. Originally only whitelisted developers were allowed to use SGX at all, back when DRM was the only use-case they had in mind.


It clearly wasn't an afterthought, I don't think anyone familiar with the design could possibly say that. It's intended to allow any arbitrary OS to use it, and in fact support on Linux has always been better than on Windows, largely because Intel could and did implement support for themselves. It pays a heavy price for this compared with the simpler and more obvious (and older) chain-of-trust approach that games consoles and phones use.

The whitelisting was annoying but gone now. The justification was (iirc) a mix of commercial imperatives and fear that people would use it to make un-reversable ransomware/malware. SGX was never really a great fit for copy protection because content vendors weren't willing to sell their content only to people with the latest Intel CPUs.


Indeed, it remains to be seen whether or not SGX2 will be trustworthy; the proof is in the pudding. However, other vendors have their own solutions to the same problem, and least AMD's approach is radically different, so one hopes that at least one of them will stand up to scrutiny.


I'm sure you're already familiar, but for others, there is also AMD's SEV-SNP [0] and Intel's TDX [1] that solve similar problems. Azure has SEV-SNP VMs in preview [2] - full disclosure, I work at Microsoft and was involved in this :)

[0] - https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthen...

[1] - https://www.intel.com/content/www/us/en/developer/articles/t...

[2] - https://azure.microsoft.com/en-us/blog/azure-and-amd-enable-...


In terms of security these technologies are (mostly) strictly worse than SGX, and there hasn't been nearly enough security research done on them. Also, the physical attack vectors remain (e.g. https://arxiv.org/abs/2108.04575). A small but interesting counterexample is x86 TSC: with SGX the MSR is modifiable by ring 0, with AMD SEV-SNP it is protected from the hypervisor. The real value of these newer technologies is not increased security, but rather increased usability. (Also, small sidenote: TDX relies on SGX.)

I think the biggest contribution that cloud providers can bring to the table in the mid-term is mitigation of the physical attack vector. This would involve inserting themselves as a second root of trust in attestations (SEV has explicit support for this), which would mean that a real world attack would require collusion of multiple parties (pick 2 of Hardware vendor, Cloud provider, Software vendor).


Except that the security model on the server side is broken as there is no way for Intel to know that a key is compromised and thus revoke it; at least for the DRM use-case sharing cracked keys on forums is common. Why would an attacker ever share keys in the tenant/host case? Moreover, is there some reason to believe that Amazon, Google, or Microsoft would struggle to extract a key if they are indeed malicious? Is there a good reason to believe that Intel would never just give keys to certain government agencies when asked? If you are worried about a malicious host, SGX/etc. are at best a partial, very limited solution even if all you care about is integrity/attestation.

SGX and TEEs generally are and always were a DRM solution, with the server use-case mostly being an afterthought that the marketing teams pushed hard. They also create a fantastic forced-obsolescence program as they require active support on the part of chip makers throughout an application's lifecycle; Intel can arbitrarily deprecate otherwise functional CPUs by just not revoking compromised keys (and perhaps releasing a few into the wild just to force people to upgrade).


Analytics and ML on confidential data are some interesting server side use cases. See the MC2 open source project, for example: https://github.com/mc2-project/mc2


But on the server side it's possible for tenants to use these hardware features to run workloads that the hosts verifiably cannot modify or inspect.

IMHO that is still a bad thing because it violates some fundamental principles around what ownership really means.


The hosts still have the authority to refuse to allow tenants to run encrypted workloads. The concept of ownership is preserved, but some owners may choose to give up their ability to snoop in order to attract tenants who otherwise would be incapable of using cloud infrastructure at all.


How do you know that when your software calls SGX instructions from inside a VM, that it's actually getting the hardware CPU's SGX implementation, rather than an arbitrary software SGX instruction-shim implementation provided by the hypervisor?


The CPU has keys that you verify by asking Intel, is the short of it.

Of course if you're inside the hypervisor and being software emulated, it could do anything to you. The idea is you verify enclaves remotely, so no other system would want to talk to your software emulator and share it secrets.

You can mess with your local copy of the enclave, but then no one else will be able to verify it remotely (because the Intel CPU won't sign it like you need it to)


Exactly.

And the main limitation of this approach is that it's really hard to prevent breaking a single cpu from being a class break.

You manage with glitches or side channels (e.g. spectre for sgx) to steal the per cpu secrets, and then you setup an emulator that can obtain intel attestation, then suddenly any application that depends on the impossibility of an emulator faking that attestation is broken.

It's even harder for these DRM-focused SGX like solutions because they really need the attestation to be anonymous to avoid it being a massive tracking vector and privacy breach... while a more traditional hsm attestation would still identify the device and potentially allow limiting the impact of compromising a single one.


How do you find out what CPU your VM is running on in a compute cloud, in order to ask Intel about it? You can't go to the data center and look at the serial number printed on the chip. It's probably negative-ROI for an IaaS vendor to have the ops staff at the DC to go do that as a customer service. And AFAIK there's nothing like a control-plane API for querying a hypervisor's hardware serial numbers in an IaaS-maintained inventory DB. So presumably, you have to... ask the VM itself. Maybe using something like the (long removed) CPUID instruction's "CPU serial number" output?

Presuming you can only learn the CPU's ID through through the VM itself, then an attacker with access to the hypervisor, plus at least one private key extracted from a sacrificial CPU of the same model, could just have the VM report the extracted-from CPU's serial number, and then use the respective extracted private key in their SGX enclave emulation. And this would check out with Intel.

Or, of course, a lot more simply, you could just make up your own keys instead of extracting any Intel keys, and then have the VM rewrite any Intel CPU root certs it finds in the VM's memory to be the attacker's certs instead (and any hashes of those certs be the hashes of the attacker's certs, etc.); such that messages signed by fake-SGX validate within the VM, and messages encrypted by fake-SGX decrypt within the VM, and messages encrypted by the VM decrypt within fake-SGX. In other words — don't keygen the user's workload; crack it. The SGX enclave is very rarely used in such a way where the component checking it is running on anything other than the same VM calling into it, so why bother worrying about what other untainted machines communicating with the enclave might see? That'd be like worrying about what more-sensible third-parties might tell your victim in a confidence scheme.


Yes. If you extract keys, you win the game.

I think their best answer is a sort of blacklist system where if Intel becomes aware one of their key has leaked, their servers could stop telling people it is genuine (this gets into the details of EPID and DCAP that I really don't want to clutter my memory with, so my retelling may be less accurate there)

To prevent the idea with fake certificates, I think you "simply" pin some Intel root certs in your enclave. Do TLS with those, verify the "TCB" cert chain with them too. Then the VM either lets you run on a real CPU and cannot poke your encrypted memory, or it emulates you and this is back to the previous scenario (it owns you locally, but remote attestation fails)

The whole thing that makes SGX interesting and not a boring traditional HSM is remote attestation for secret provisioning, sealing of secrets, etc. This means you in theory run workloads on someone else's computer without having to trust it. If your secrets are already on the attacker-controlled VM and you only verify things locally, this is useless. Nothing can save you, the VM already owns your entire environment.


There is already a blacklisting/revocation system for leaked SGX keys -- which failed completely when it was put to the test, after researchers tried publishing some keys they extracted on Twitter. However, it depends on Intel becoming aware of a leaked key, which makes perfect sense for the original DRM use-case and makes no sense in the cloud/server/hosting/etc. use-case.


It's also used extensively by Signal (not just for MobileCoin, as a sibling comment suggests).


Interestingly enough, I believe there's now no official way of playing UHD-BR on modern Intel systems (ie. you'd have to use an AACS/BD+ crack), since all certified players needed SGX.

I don't know what this means for 4K Netflix, if anything.


The usecase could have included all manner of useful pro-privacy and security usage-- but intel was extremely restrictive with access to the ability to use it in part because they expected to monetize the DRM usage.


I thought you could use it for HSM-type workloads too


Like a third party’s HSM running locally on your computer, heh


No, like, you could use something like https://www.ego.dev/ and have Vault or something that worked more like an HSM


In other words, a smartcard :)


Hardware-enforced anticheat (for certain kinds of games) is a pretty cool potential usecase.


The second use-case was MobileCoin, that cryptocoin scam Signal signed up for. Reflects well on their other crypto.


The more interesting thing to me than SGX is that all of TSX-NI is deprecated? Not just HLE, but RTM too? Meaning there'll be no more software transactional memory at all?! Anyone able to shed any light on why they're doing this? Is it just security or is it just not worth it even regardless of that? Is there a chance they'll reintroduce it in some form, perhaps in other processor series?


TSX has always been kind of useless. They got repeated feedback that they had to document what would cause deterministic spurious rollbacks (usually bad cache interactions within the same transaction causing lines to spill), so that applications could be written without the slow (non hardware assist) fallback path.

Oh well. Maybe some other company will build hardware transactions in a way that takes developer feedback into account.

Crosspoint DIMMs had an analogous problem. There was no way to pin a cache line for updates, so that it couldn’t spill to persistent storage mid page update. Some cool workarounds came out of the research community, but, in the end, that technology is dead on arrival too.


Looks like it causes memory ordering issues [1] on the affected processors. You can turn the capability back on with a flag, but Intel warns this is "not for production use"

[1]: https://cdrdv2.intel.com/v1/dl/getContent/604224


Wow, I see, thanks. Makes you wonder why they couldn't fix it instead—sounds like it was a fundamentally difficult problem to solve?


It is a fundamentally difficult problem to solve. On top of that, the costs grow exponentially with core count and processor scaling compounds.

If you look back at so many of the intel architectural extensions it is so hard NOT to draw the conclusion that Intel has a POOR understanding of where problems should be solved. They constantly try to solve problems which should be solved in software with hardware. This is why they are now an architectural generation behind AMD and why they are most likely going to lose the CPU market to Apple imitators.

Unless AMD can maintain its success despite the talent bleed it will start to face — the days of x86 are limited.


Why is Intel wrong for trying to solve software problems in hardware, while ARM/CHERI are celebrated for being ahead in trying to solve memory safety in hardware using pointer upper-bit tagging and now CHERI's provenance metadata (performing fine-grained bounds checks on every single pointer dereference, rather than just following instructions without performing extra work)?


I think it's less a criticism of trying to solve problems in hardware vs the _kinds_ of problems they're focusing on.

While CHERI is, from a pure theory standpoint, something that is perfectly avoidable with proper programs (i.e. memory unsafety _is_ efficiently avoidable in software), we ended up needing it we made the wrong choice in software too long ago to turn back (nobody is rewriting the Linux kernel anytime soon). In this way, CHERI is a good optimization because it does something we cannot _practically_ solve in software. ARM-PA plays a similar role in that hardware CFI can be made irrelevant by a) not having memory safety issues b) software CFI, but neither have really worked out in practice and it's cheap and efficient in hardware, so it's a worthwhile tradeoff.

Stuff like Intel TSX and ARM TME are sort of at the other end. Transactional memory is _super_ cool and it's been a common thread throughout architecture papers for the past twenty years. The thing is, we've never had transactional memory in commodity hardware (and nobody buried their heads in the sand about not having it like we did with memory safety) so all our software found decent work arounds eventually. TSX/TME does do what it says, the issue is just that it's not quite good enough when compared to existing software techniques and so the actual added value (cache noise and the resulting spurious aborts included) made it a less good deal. When adding the cost to both update software and the likely strongly polynomial (?) hardware cost of transaction support as core count grows (this is why ARM's Exclusive Monitor performs SO bad on systems with 64+ cores and why they added new atomics just to avoid the monitor), it just doesn't work anymore.


Not a hardware architect, but my spitballing as a compiler writer:

Transactional memory is one of those things that constantly sounds like it's a good idea in theory, but it doesn't live up to those ideas in practice. One of the issues with hardware transactional memory is the challenge of spurious aborts or otherwise running up against hardware limits as to how big transactions can be. Another (as far as I'm aware) unsolved issue is defining a memory model that supports both transactional memory and the modern C/C++ memory model. I also don't think there's a lot of practical benefits--you can make it quite far with existing parallelism libraries that expose something vaguely task-based or fork-join, such as OpenMP or STL's parallel executors, and there's relatively little need for algorithms where you don't necessarily know if there's going to be contention or not.


Having used GHC Haskell's software transactional memory features to build a concurrent service with caching, I do think there are significant practical benifits. To program threaded code without needing to worry about global reasoning of fine-grained locking is a godsend. Transactional memory solves the issue of writing composable code that works on shared mutable state. Task libraries do not solve this problem.

Of course retrofitting such a feature into C/C++ in a satisfactory manner might not be possible. But the practical benifits are real.


There's more to life than C++. It was implemented in OpenJDK and seemed to get some good results there in micro-benchmarks at least. The nice thing is, it was a transparent upgrade. Synchronized blocks just became TSX transactions, unless they aborted too much, in which case they went back to being ordinary lock based critical sections.

The bigger problem was actually that in many cases where the optimization could be applied there was always a conflict, usually because of updating some sort of statistical counters. So a lot of attempts to optimize this way would de-opt. It could be fixed by changing the way stats were aggregated to be more tx-friendly but few developers ever did it.


"Anyone able to shed any light on why they're doing this?"

Same question here, As usual, no reasons are given (must admit I'm getting sick of these corporations failing to explain their actions). Right, I'm old enough to remember when companies used to issue detailed revisions sheets/documentation wherein they described what the changes were and the reason for them just as a matter of course.

(Even after being an unwilling member of the users' mushroom club for some 20/30 years, I'm still having difficulty adjusting.)


TSX has never worked, and it seems a little over-ambitious in general. SGX has a few fatal flaws in its security model, and we have moved on to newer models there. The principle of tech debt applies to hardware as well as software, and I hope they come up with a new idea for transactional memory in future cores that actually works, and isn't so ambitious.


All of TSX is present on Intel server processors. Just gone from client.


I wrote my graduate school thesis on Intel SGX. I was not a good student in grad school, my thesis was not some innovative thing but now it is completely useless.

Edit: There is another comment linking an article from Intel saying SGX is full steam ahead in Xeon. Yay my thesis is not useless!


Yup. SGX, TSX, all the interesting and complicated stuff seems to be getting deprecated after half a decade or more of "We got it! No, wait, we didn't... uh, this time we got it! Wait, crap, no... uh... but this time! Oh carp. Yeah, you know, screw it."

After several of those "Release, revert" cycles, it ends up as a self fulfilling prophecy anyway - it's like the sentiment towards Google's new products you see often: "This, too, shall rapidly pass when they get bored." After you've seen TSX disabled on a few generation of chips, the motivation to put the work in to make something work with TSX just kind of evaporates, because you've no confidence that it'll actually work, or stay working, on hardware you want to run on. And because of the requirement to have a fallback path, TSX is a good bit more work, and, often, requires more complexity than a simple lock based approach that's good enough and simple to understand/validate.

But my deeper concern is that it seems that nobody at Intel is capable of understanding all the interaction in the chip anymore - and SGX offers very strong evidence of this inability.

SGX made the strong claim that, when deployed, a fully malicious ring 0 operating system could neither observe anything about the state of the compute happening in the enclave, nor modify the operation of that. They did various interesting things with how pages were swapped out to prevent replay attacks, and really did try to build it such that you couldn't mess with it. But they did these things at a high level, and didn't fully understand the nature of the chip.

The L1TF (L1 Terminal Fault, also known as Foreshadow) attacks took advantage of the edge case L1 cache behavior to speculate out out anything that was in L1 cache, which included SGX enclave data. If I remember properly, because you could read out the stored register state as well as memory pages you faulted in, they demonstrated you could essentially single step a production SGX enclave with full register state and full memory state at every single instruction. Whoops.

It's not hard to mitigate once you know the problem - just flush L1 entirely on exit. But Intel didn't know it was a problem, so they didn't do that.

On the flip side, "influencing operation," there was Plundervolt. This involved the OS using an undocumented (grumble growl) MSR to reduce the voltage of the chip for improving efficiency of operation. However, the OS (that untrusted ring 0 thing...) has control over this register. And there aren't sane limits on it, such that the OS can drop the voltage enough that things like "multiply" and "AES operations" start faulting and glitching (silently), without being low enough that the chip stops functioning. Enter an enclave in this state, wait for multiply or AES to fault in the useful ways they will, and you've just influenced operation such that you can pull keys out. Whoops.

Again, it's not hard to mitigate. Refuse to enter if the voltage isn't at stock settings (you can't just reset it on entry because it takes time for the VRMs to bring the voltage back up). But Intel didn't do this. The people who added this neat little efficiency hack and then kept it secret never rubbed the right way with the people in charge of the new flagship security features around the sort of adversarial thinkers who can ask "Now, wait a minute, what if I push this beyond sane bounds?"

You can point at the other speculative stuff and claim it's not really a problem because architectural behavior is correct (I think that sort of reasoning is rubbish, when you can speculate your way past all security boundaries on the chip), but the SGX case, specifically, demonstrates that Intel didn't know about the problems or they would have taken the very simple mitigation steps. And that tells me that they can't reason about their chips as a whole.

... and that - hardware companies of the most critical components of the system not having a full understanding of how they operate - is scary. The foundation of everything is in an unknown state, and nobody knows how broken it is until some researchers go in and figure it out.

More than once, after fixing the exact thing the researchers found, Intel has also had egg on their face of the "... so we found this very, very closely related, conceptually identical bug that they didn't fix with the last patches..." variety. It seems safe to say that there are university students and faculty who understand the security implications of Intel's design decision better than the people at Intel in charge of such things.

We're running, very rapidly, out of "complexity runway." Everything, from the very chips on up, is so complex that nobody can reason about it, and the only solution to the very problems caused by complexity is, "Well, let's add more complexity to fix those problems." It's not the sort of thing that can go on forever.

Anyway. </rant about the state of Intel>


I think that sort of reasoning is rubbish, when you can speculate your way past all security boundaries on the chip

The ring boundaries of protected mode were never meant as a strong security feature against malice. The documentation of the 286, the first CPU in which they were introduced, is very clear in saying that. It's unfortunate how many assumed otherwise and built an entire industry upon that misunderstanding.


One sees the same problems with ASLR - page tables were never intended or designed to carry security sensitive information, which the various forms of ASLR are. And so, we see the prefetch oracle, and various cache based trickery to de-ASLR things, because the page tables and page walkers were never designed to consider security, only correctness.

I don't know how to fix it, though.

I've been experimenting with Qubes lately, which disables hyperthreading if you have it, and uses hardware isolated VMs to at least make things a little bit harder - the assumption is that within a running OS VM/silo, anything can access anything, so keep them separated. And they've done a lot of good paranoid work along those lines. I'm just not sure the end goal of very strong isolation is even possible on the same machine.

Of course, there are chips that are immune to speculation based vulnerabilities. They're not fast, and they're not very modern, but the Atom D525 in my little netbook has an empty "bugs" field in /proc/cpuinfo, because it's an in-order, non-speculative x86 core. It's just rather glacial.


I agree with a lot of what you write, and I also think we are way beyond that "comprehensibility boundary" when it comes to modern tech stacks. There is just no single person who understands exactly what happens on all levels of the stack when I send this reply.

But also, this process of "we got it, wait we didn't..." is just how real world security works, there is no way around it. Security is not <clever research team coming up with moon math> and problem is solved. Security is complex, and takes years of attack incentives and hardening to mature. TLS implementations can use the best crypto algorithms we know of, and we still get Heartbleed. Intel already with the very first release of SGX introduced the TCB recovery mechanism, precisely because they knew users are bound to find vulnerabilities.

There is also a strong hysteresis effect because of the long release cycle of chips. For example, SGX was released in 2015/2016 with Skylake, and then two years later we discovered Meltdown/Spectre and with them a whole new dimension of attacks on the CPU. However, Intel couldn't just release a hotfix for their hardware, it took a lot of time and work to re-design the CPU to be more side-channel resistant, and in the meantime security researchers naturally latched onto these attacks, giving the false impression that the whole idea of secure compute is flawed.

Personally I would not bet on CC tech becoming obsolete, on the contrary, a lot of Big Tech are pumping more and more resources into it, and there is increasing demand from various industries. The tech will stay around, it will mature, and perhaps vendors will even start to introduce HSM-like hardware protection mechanisms if there is enough demand.


I'm more sad about the deprecation if Hardware Lock Elision on that page...

What was wrong with it? It seemed to offer such massive performance benefits for complex multithreaded code.


I pity that MPX is gone as well, this makes x86/x64 lose to ARM in hardware memory tagging.

While it was buggy, Intel doesn't seem keen in bringing in a replacement.


The approach is fundamentally flawed.


MPX sure, that doesn't prevent bringing a new one, otherwise in a couple of years Intel/AMD will be the only ones left standing without hardware memory tagging support.


I mean, having played with MPX for use in a JIT, it was slower than just manually bounds checking.


With less security guarantees too.


What is the significance of this when looking at the fact that SGX support has been recently merged into linux main tree?

I was waiting till next Ubuntu LTS to have qemu packaged with SGX support in order to reverse engineer my fingerprint sensor (which is match-on-host, and uses SGX under Windows).


Can't wait to move on to RISC-V, where vendors can do their experiments in custom extensions without disturbing anyone.

Only that which is solid and the board can agree on becomes a ratified specification. The instruction counts are still sane[0], despite it is not missing anything of significance which ARM or x86-64 have anymore, as of the group of extensions that were ratified late 2021.

[0]. https://en.wikipedia.org/wiki/RISC-V#Design


I don't see why you think RISC-V would make any difference. It's still totally possible to get official or de facto RISC-V extensions that turn out to be a bad design and get abandoned.

Just because it is too young for that to have happened yet doesn't mean it can't.


Wouldn’t those who cannot afford to make their own chips still be beholden to changes and business desicions of they suppliers, regardless of the underlying architecture?

What would be the difference between deprecating SGX on Intel and a similar technology on a RISC-V based chip?


SGX is used by Signal’s server[0] to allow an open source trust-Intel / trust-their-hardware, but no one else based private contact discovery.

[0] https://signal.org/blog/private-contact-discovery/


So there is that 2017 blog post that said they were thinking of doing something with SGX... Have they mentioned anything else about it after that?


AKA: "share your contacts with the NSA only"


Even if you dont believe in sgx, the nsa would still have to be in signal's servers somehow, in which case you are probably screwed anyways


Edward Snowden is a public supporter and user of Signal. He's my canary in the coal mine. The moment the US three letter agencies get access to the data on Signal's servers, Snowden's dead. I feel comfortable in the resilience of their architecture as long as he's alive.


Do you think his location is some great mystery? The actual reason he is not dead is because there is no political will to conduct an operation on Russia soil to murder him.


Isn't Snowden alive because the US decided not to kill him? Doesn't sound like killing him is that hard


Nobody is coming to kill him. The scenario as a whole makes no sense when you think about it with even the most gentle level of scrutiny.

His only danger at this point is jail.


Yeah, even if you believe the usa is full on evil, extrajudicially murdering someone who was granted asylum in russia makes very little sense from a political perspective given america's position in the world.


This is an entire movie script you’ve made up in your mind. That isn’t how any of it works .


why anyways?

don't they say the chat is end to end encrypted?


The crypto coin they include a wallet for, MobileCoin, is also based around SGX afaik.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: