This is super cool. This exploit will be one of the canonical examples that just running something in a VM does not mean it's safe. We've always known about VM breakout, but this is a no-breakout massive exploit that is simple to execute and gives big payoffs.
Remember: just because this one bug gets fixed in microcode doesn't mean there's not another one of these waiting to be discovered. Many (most?) 0-days are known about by black-hats-for-hire well before they're made public.
The problem is, VMs aren't really "Virtual Machines" anymore. You're not parsing opcodes in a big switch statement, you're running instructions on the actual CPU, with a few hardware flags that the CPU says will guarantee no data or instruction overlap. It promises! But that's a hard promise to make in reality.
Looking at the IBM's tech from the sixties is somehow weirdly depressing: it's unbelievable how much of the architectural stuff they've invented already by the 1970.
Not depressing, but inspiring. So many great architectural ideas can be made accessible to millions of consumers, not limited to a few thousand megacorps.
In the early days of virtualization on PCs (things like OS/2's dos box) the VM was 100% a weird special case VM that wasn't even running the same mode (virtual 8086 vs 286 / 386 mode), and that second-class functionality continued through the earlier iterations of "modern" systems (vmware / kvm / xen).
"PC" virtualization's getting closer to big iron virtualization, but likely will never quite get there.
Also -- I was running virtual machines on a 5150 PC when it was a big fast machine -- the UCSD P System ran a p-code virtual machine to run p-code binaries which would run equally well on an apple 2. In theory.
IMO, it’s only a special case for commercial support reasons. Almost every engineer, QE, consultant, solution architect I know runs or has run nested virtualization for one reason or another.
Just how many times is the average operating system workload (with or without a virtual machine also running a second average operating system workload) context switching a second?
Like... unless I'm wrong... the kernel is the main process, and then it slices up processes/threads, and each time those run, they have their own EAX/EBX/ECX/ESP/EBP/EIP/etc. (I know it's RAX, etc. for 64-bit now)
How many cycles is a thread/process given before it context switches to the next one? How is it managing all of the pushfd/popfd, etc. between them? Is this not how modern operating systems work, am I misunderstanding?
> How many cycles is a thread/process given before it context switches to the next one?
Depends on a lot of things. If it's a compute heavy task, and there's no I/O interrupts, the task gets one "timeslice", timeslices vary, but typical times are somewhere in the neighborhood of 1 ms to 100 ms. If it's an I/O heavy task, chances are the task returns from a syscall with new data to read (or because a write finished), does a little bit of work, then does another syscall with I/O. Lots of context switches in network heavy code (io_uring seems promising).
> How is it managing all of the pushfd/popfd, etc. between them?
The basic plan is when the kernel takes an interrupt (or gets a syscall, which is an interrupt on some systems and other mechanisms on others), the kernel (or the cpu) loads the kernel stack pointer for the current thread, then it pushes all the (relevant) cpu registers onto the stack, then the kernel business it taken care of, the scheduler decides which userspace thread to return to (which might be the same one that was interrupted or not), the destination thread's kernel stack is switched to, registers are popped, then the thread's userspace stack is switched to, then userspace execution resumes.
Why do comments like this just make a bold claim and then wander off as if the claim stands for itself? No explanation. No insight. I mean why should we just take your word for it?
I'd like to be educated here why a big switch statement wouldn't necessarily protect us from these CPU vulnerabilities? Anyone willing to help?
The question should rather be: why would it protect you? This switch statement also runs on a CPU, which is still vulnerable. This CPU still speculates the execution of the switch statement. No amount of software will make hardware irrelevant.
Hence my choice of phrasing: 'wouldn't necessarily protect you'.
So, yes, the switch statemement might be safe, but you would need to prove that your switch statement doesn't use those instructions. You don't get to claim that for free just because you are using a switch-statement.
Conversely, even if you execute bare metal instructions for the user of the VM, you could also deny those instructions to the user. Eg by not allowing self-modifying code, and statically making sure that the relevant code doesn't contain those instructions.
So the switch statement by itself does not do anything for your security.
Tangent: To deny those bare-metal instructions with static analysis, you might also have to flat out deny certain sequences of instructions that, when jumped to "unaligned" would also form the forbidden instruction. That might break innocent programs, no?
Simple: don't allow unaligned jumps. Google's NaCl already figured out how to do that ages ago. (Eg you could only allow jumps after a bit-masking operation. Details depends on architecture.)
But yes, unless you solve the halting problem, anything that bans all bad programs will also have false positives. It's the same with type systems in programming languages.
Even if we pretend docker is a VM, building an image can happen on as many cores as you like in this hypothetical, it's the running of it that should be restricted.
The comparison to Meltdown/Spectre are a bit misleading though - they were a whole new form of attack based on timing where the CPU did exactly what it should have done; This zenbleed case is a good old fashioned bug though - data in a register that shouldn't be.
Running untrusted code whether in a sandbox, container, or VM, has not been safe since at least Rowhammer, maybe before. I believe a lot of these exploits are down to software and hardware people not talking. Software people make assumptions about the isolation guarantees, hardware people don't speak up when said assumptions are made.
Hardware people are the ones making those promises, so I don't think that's right at all. And Rowhammer is a way overstated vulnerability - there are all sorts of practical issues with it, especially if you're on modern, patched hardware.
In the end, I'm thinking most of these are related to branch prediction?
It strikes me that it's either that branch prediction is so inherently complex enough it's always going to be vulnerable to this and/or it just so defies the way most of us intuitively think about code paths / instruction execution that it's hard to conceive of the edge cases until too late?
At what point does the complexity of CPU architectures become so difficult to reason about that we just accept the performance penalty of keeping it simpler?
More generally, most of them are related to speculative execution, where branch mis-prediction is a common gadget to induce speculative mis-execution.
Speculation is hard, it's sort of akin to the idea of introducing multithreading into a program, you are explicitly choosing to tilt at the windmill of pure technical correctness because in a highly concurrent application every error will occur fairly routinely. Speculation is great too, in combination with out-of-order execution it's a multithreading-like boon to overall performance, because now you can resolve several chunks of code in parallel instead of one at a time. It's just also a minefield of correctness issues, but the alternative would be losing something like the equivalent of 10 years of performance gains (going back to like ARM A53 performance).
The recent thing is that "observably correct" needs to include timings. If you can just guess at what the data might be, and the program runs faster if you're correct, that's basically the same thing as reading the data by another means. It's a timing oracle attack.
(in this case AMD just fucked up though, there's no timing attack, this is just implemented wrong and this instruction can speculate against changes that haven't propagated to other parts of the pipeline yet)
The cache is the other problem, modern processors are built with every tenant sharing this single big L3 cache and it turns out that it also needs to be proof against timing attacks for data present in the cache too.
> At what point does the complexity of CPU architectures become so difficult to reason about that we just accept the performance penalty of keeping it simpler?
Never for branch prediction. It just gets you too much performance. If it becomes too much of a problem, the solution is greater isolation of workloads.
In certain cases isolation and simplicity overlap, I suspect for example that the dangers of SMT implementation complexity are part of why Apple didn't implement it for their respective CPUs. Likely we'll see this elsewhere too, for example Amazon may not ever push to have SMT in their Graviton chips (the early generations are off the shelf cores from ARM where they didn't have a readily available choice).
I could be mistaken, but I don't think Zenbleed has anything to do with SMT, based on my reading of the document. There is a mention of hyperthreads sharing the same physical registers, but you can spy on anything happening on the same physical core, because the register file is shared across the whole core.
It even says so in the document:
Note that it is not sufficient to disable SMT.
Apple's chips don't have this vulnerability, but it's not because they don't have SMT. They just didn't write this particular defect into their CPU implementation.
Correct, I was responding to parent writing "At what point does the complexity of CPU architectures become so difficult to reason about that we just accept the performance penalty of keeping it simpler?"
I think we may be seeing an industry-wide shift away from SMT because the performance penalty is small and the complexity cost is high, if so that fits parent's speculation about the trend. In a narrow sense Zenbleed isn't related to SMT but OP's question seems perfectly relevant to me. I come from a security background and on average more complicated == less secure because engineering resources are finite and it's just harder and more work to make complicated things correct.
Not really if that's an attack you're concerned about, because guests can attack the hypervisor via the same mechanisms. You would need to gang schedule to ensure all threads of a core were only either in host or guest.
>At what point does the complexity of CPU architectures become so difficult to reason about that we just accept the performance penalty of keeping it simpler?
Basically never for anything that's at all CPU-bound, that growth in complexity is really the only thing that's been powering single-threaded CPU performance improvements since Dennard scaling stopped in about 2006 (and by that time they were already plenty complex: by the late 90s and early 2000's x86 CPUs were firmly superscalar, out-of-order, branch-predicting and speculative executing devices). If your workload can be made fast without needing that stuff (i.e. no branches and easily parallelised), you're probably using a GPU instead nowadays.
You can rent one of the Atom Kimsufi boxes (N2800) to experience first hand a cpu with no speculative execution. The performance is dire, but at least it hasn’t gotten worse over the years - they are immune to just about everything
We demanded more performance and we got what we demanded. I doubt manufacturers are going to walk back on branch prediction no matter how flawed it is. They'll add some more mitigations and features which will be broken-on-arrival.
I didn't demand more performance. My 2008-era AthlonX2 would still be relevant if web browsers hadn't gotten so bloated. I still use it for real desktop applications, i.e. everything that isn't in Electron.
Theres VLIW/'preprediction'/some other technical name I forget for infrastructures which instead ask you to explicitly schedule instruction/data/branch prediction. If I remember, the two biggest examples I can think of were IA64 and Alpha. I wanna think HP-PA did the same but I'm not clear on that one.
For various reasons, all these infras eventually lost out in the market due to market pressure (and cost/watt/IPC, I guess).
Yup! I worked at a few companies that would co-mingle Internet facing/DMZ VMs with internal VMs. When pointing this out and recommending we should airgap these VMs to it's own dedicated hypervisor it always fell on deaf ears. Jokes on them I guess.
You can pay AWS a premium to make sure you're the only tenant on the physical machine. You can also split your own stuff into multiple tenants, and keep those separate too.
Eric Brandwine (VP/DE @ AWS) said publicly in 2019 that EC2 had never scheduled different tenants on the same physical core at the same time, even before we learned about these kinds of side-channel attacks.
Even before then, the sufficiently paranoid (but still bound to AWS for whatever reason) would track usage/steal/IO reporting along with best guesses for Amazon hardware expidenture and use that information to size instances to attempt to coincide with 1:1 node membership.
Yes (lowest vCPU seems to be 2 everywhere), and that protects against this attack. However, this thread was talking about airgapping hosts, which is needed for the general threat of VM escapes.
Yes but the Firecracker VMs are pinned to specific cores. So no two tenants never share a CPU core. Other than Rowhammer, has there been a hardware vulnerability of this nature that has worked x-core? I don't recall.
Still, I think that if your company is handling user data it's worth seriously considering dedicated instances for any service that encounters plaintext user information.
That sounds like it's leaking across user/process boundaries on a single EC2 instance, which presumably also requires the processes to be running on the same core.
Leaks between different EC2 instances would be far more serious, but I suppose that wouldn't happen unless two tenants / EC2 instances shared SMT cores, or the contents of the microarchitectural register file was persisted across VM context switches in an exploitable manner.
The problem is that the logical registers don't have a 1:1 relation to the physical registers.
For example, let's imagine a toy architecture with two registers: r0 and r1. We can create a little assembly snippet using them: "r0 = load(addr1); r1 = load(addr2); r0 = r0 + r1; store(addr3, r0)". Pretty simple.
Now, what happens if we want to do that twice? Well, we get something like "r0 = load(addr1); r1 = load(addr2); r0 = r0 + r1; store(addr3, r0); r0 = load(addr4); r1 = load(addr5); r0 = r0 + r1; store(addr6, r0)". Because there is no overlap between the accessed memory sections, they are completely independent. In theory they could even execute at the same time - but that is impossible because they use the same registers.
This can be solved by adding more physical registers to the CPU, let's call them R0-R6. During execution the CPU can now analyze and rewrite the original assembly into "R1 = load(addr1); R4 = load(addr4); R2 = load(addr2); R5 = load(addr5); R3 = R1 + R2; R6 = R4 + R5; store(addr3, R3); store(addr6, R6)". This means we can now start the loads for the second addition before the first addition is done, which means we have to wait less time for the data to arrive when we finally want to actually do the second addition. To the user nothing has changed and the results are identical!
The issue here is that when entering/exiting a VM you can definitely clear the logical registers r0&r1, but there is no guarantee that you are actually clearing the physical registers. On a hardware level, "clearing a register" now means "mark logical register as empty". The CPU makes sure that any future use of that logical register results in it behaving as if it has been clear, but there is no need to touch the content of the physical register. It just gets marked as "free for use". The only way that physical register becomes available again is after a write, after all, and that write would by definition overwrite the stale content - so clearing it would be pointless. Unless your CPU misbehaves and you run into this new bug, of course.
The problem is the freed entries in the register file. A VM can, at least, use this bug to read registers from a non-VM thread running on the adjacent SMT/HT of a single physical core. I suspect a VM could also read registers from other processes scheduled on the same SMT/HT.
Not only do people do this, it's generally how VPS providers work. Most machines barely use the CPU most of the time (web servers etc.) so reserving a full CPU core for a VPS is horribly inefficient. It doesn't matter anyway, because SMT isn't relevant for this particular bug.
With SMT allowing twice the cores on a CPU for most workloads, disabling it would double the cost for most providers!
There are VPS providers that will let you rent dedicated CPU cores, but they often cost 4-5x more than a normal virtual CPU. Overprovisioning is how virtual servers are available for cheap!
SMT is relevant in the VM case of this bug because it determines whether this bug is restricted to data outside the VM or not.
Providers usually won't disable SMT completely, they'd run a scheduler which only allows 1 VM to use both SMT threads of a core. Ultra cheap VPS providers may still find that not worth the pennies though as if you sell a majority of single core VPS then the majority of your SMT threads are still unavailable even with the scheduler approach.
Fully dedicated cores aren't necessarily required because in the timesliced case the registers are unloaded and reloaded when different VMs are shuffled on and off the core. That said, they definitely prevent the cross-vm-data-leak case of this bug.
> Fully dedicated cores aren't necessarily required because in the timesliced case the registers are unloaded and reloaded when different VMs are shuffled on and off the core. That said, they definitely prevent the cross-vm-data-leak case of this bug.
Registers are unloaded and reloaded when different processes / threads are scheduled within a running VM too. That should protect the register contents, but because of this issue, it doesn't, so I don't see why it would if it's a hypervisor switching VMs instead of an OS switching processes. If you're running a vulnerable processor on a vulnerable microcode, it seems like you can potentially read things put into the vulnerable registers by anything else running on the same physical core, regardless of context.
Context switching for processes is done in software (i.e. the OS) via traps because TSS does not store all the registers and it doesn't offer a way to be selective to what the process actually needs to load (=slower). This limits its visibility to what's in the actively mapped registers as well as not guaranteeing the procedure even tries to reload all the registers. In this case, even if the OS does restore certain registers it has no way to know the processor left specific bits of one speculatively set in the register file.
On the other hand, "context switching" for VMs is done via hardware commands like VMSAVE/VMLOAD or VMWRITE/VMREAD which do save/load the entire guest register context, including the hidden context not accessible by software which this CVE is relying on. Not that it isn't impossible for this to be broken as well, but it's a completely different procedure and one the hardware is actually responsible for completely clearing instead of "supposed to be reset by software".
So while the CVE still affects processes inside of VMs the loading/unloading behavior inter VM should actually behave as a working sandbox and protect against cross-VM leaks, barring the note by lieg on SMT still possibly being a problem (I don't know enough about how the hardware maintains the register table between SMT threads of different VMs to say for sure but I'm willing to guess it's still vulnerable on register remappings).
There may well be other reasons I'm completely mistaken here but they'd have to explain why the inter-VM context restore is broken not why it works for inter-process restore. The article already explains why the latter happens, but it doesn't make a claim about the former.
I can't easily find good documentation on the instructions you mentioned; but are you sure those save and load the whole register file, and not just the visible registers? There are some registers that are not typically explicitly visible, that I'd expect to also be saved or at least manipulable in a hypervisor, but just like the cache state isn't saved, I wouldn't expect the register file to be saved.
If we assume the register file isn't saved, just the visible registers, what's happening is the visible registers are restored, but the speculative dance causes one of the other values in the register file to become visible. If that's one of the restored registers, no big deal, but if it was someone else's value, there's the exploit.
If you look at the exploit example, the trick is that when the register rename happens, you are re-using a register file entry, but the upper bits aren't cleared, they're just using a flag to indicate the bits are cleared; then when rolling back the mispredicted vzeroupper unsets the flag, the upper bits of the register file entry are revealed.
Reading more the VM* command sets definitely load/save more than just the normally visible registers, the descriptions in the AMD ASM manual are very explicit about that. However, it looks like (outside the encrypted guest case where everything is done in 1 command) the hyper visor still calls the typical XRSTOR for the float registers, which is no different than the normal OS case. If that's true then I can see how the register file is still contaminated in the non SMT case.
Well you don't have to reserve any CPU Cores per VM. There's no law saying you can't have more VMs than logical cores. They're just processes after all and we can have thousands of them.
Of course not, but the vulnerability works by exploiting the shared register file so to mitigate this entire class of vulnerabilities, you'd need to dedicate a CPU core and as much of its associated cache as possible to a single VM.
In the context of this conversation, SMT on/off is relevant to what scope of the vulnerability has with VMs beyond the claim in the article that the issue is in some way present inside VMs.
In the context of this conversation, SMT on/off is relevant to what scope of the vulnerability has with VMs beyond the claim in the article that the issue is in some way present inside VMs.
The README in the tar file with the exploit (linked at "If you want to test the exploit, the code is available here") contains some more details, including a timeline:
- `2023-05-09` A component of our CPU validation pipeline generates an anomalous result.
- `2023-05-12` We successfully isolate and reproduce the issue. Investigation continues.
- `2023-05-14` We are now aware of the scope and severity of the issue.
- `2023-05-15` We draft a brief status report and share our findings with AMD PSIRT.
- `2023-05-17` AMD acknowledge our report and confirm they can reproduce the issue.
- `2023-05-17` We complete development of a reliable PoC and share it with AMD.
- `2023-05-19` We begin to notify major kernel and hypervisor vendors.
- `2023-05-23` We receive a beta microcode update for Rome from AMD.
- `2023-05-24` We confirm the update fixes the issue and notify AMD.
- `2023-05-30` AMD inform us they have sent a SN (security notice) to partners.
- `2023-06-12` Meeting with AMD to discuss status and details.
- `2023-07-20` AMD unexpectedly publish patches, earlier than an agreed embargo date.
- `2023-07-21` As the fix is now public, we propose privately notifying major
distributions that they should begin preparing updated firmware
packages.
You'd want the delay between first publication of X and the microcode update making its way into releases of OSes to be smallest, for various values of X (mention of a vulnerability, microcode patch, description of vulnerability, PoC). Making various OS releasers aware that a microcode patch that fixes a vulnerability will be published on a given date before that date decreases that for most values of X.
Won't that theoretically allow malicious actors to study the patch and exploit the now 1-day vulnerability?
Not that I think it's realistic to develop an exploit and gain real value in three days, but theoretically, if all parties had taken more than three days to distribute and apply the patches?
Which is 0bc3126c9cfa0b8c761483215c25382f831a7c6f and b250b32ab1d044953af2dc5e790819a7703b7ee6
And b250b32ab1d044953af2dc5e790819a7703b7ee6 appears to be the commit I linked ealier at git.kernel.org so hopefully up-to-date Arch is not vulnerable to zenbleed
Either way, as noted elsewhere in the comments, only the Rome CPU series has received updated microcode with fixes. All other Zen 2 users need the fix that was released as part of Linux 6.4.6: https://lwn.net/Articles/939102/
This is incredibly scary. On my Zen 2 box (Ryzen 3600) logging the output of the exploit running as an unprivileged user while copying and pasting a string into a text editor in the background (I used Kate), resulted in pieces of the string being logged into the output of zenbleed. And this is after a few seconds of runtime mind you, not even a full minute.
Thankfully the exploit is highly dependent on a specific asm routine so exploiting it from JS or WASM in a browser should be extremely difficult. Otherwise a nefarious tab left open for hours in the background could exfiltrate without an issue.
I'm eagerly waiting for Fedora maintainers to push the new microcode so the kernel can update it during the boot process.
> Thankfully the exploit is highly dependent on a specific asm routine so exploiting it from JS or WASM in a browser should be extremely difficult. Otherwise a nefarious tab left open for hours in the background could exfiltrate without an issue.
What about it is very bold? The instruction sequence mentioned seems pretty reasonable and not at all out of the question for a JavaScript JIT to generate.
> Thankfully the exploit is highly dependent on a specific asm routine so exploiting it from JS or WASM in a browser should be extremely difficult.
I assume that once/if a method is found it will be applicable broadly though. At the same time, hopefully software patches in V8 and SpiderMonkey will be able to mitigate this further and sooner.
But a JS exploit would require some way to exfiltrate data and presumably doing that would be quite difficult to hide entirely.
I had to run make on the uncompressed folder. Perhaps the build-essential package doesn't come with NASM in Ubuntu? I'll need a bit more info on the error if you want me to try and help you :)
The parent commenter seems to have figured this out, but to clarify a bit for posterity: build-essential does not come with nasm on Ubuntu (or upstream Debian, AFAICT). It has to be installed separately for the Zenbleed PoC to compile (if not already installed).
After extracting the POC and installing build-essential, I still get this:
nasm -O0 -felf64 -o zenleak.o zenleak.asm
make: nasm: No such file or directory
make: ** [Makefile:11: zenleak.o] Error 127
Gentoo already has it, however the latest ebuild is still masked, so one would need to put "sys-kernel/linux-firmware ~amd64" inside a file in /etc/portage/package.accept_keywords, or better yet, always run the git version, using * instead of ~amd64.
Apart from that, it's necessary to "sudo emaint sync -A && sudo emerge -av sys-kernel/linux-firmware", while checking that the correct files are included in the savedconfig file if using it. After that, rebuild the kernel or the initramfs and reboot.
I'm not sure five year olds know what microcode is. I'm 35, been in tech nearly 20 years and don't recall having heard that specific term before today.
The whole "explain like I'm 5" thing is ridiculous. A huge percentage of topics simply cannot be broken down to an average 5 year old in a way that makes the conversation worth having at all. The 5 year old has no context about why in recent years there has been a huge push towards running your own code on other people's computers using various isolation techniques, or why people are trying to exploit that. The 5 year old has no context for what the exploits actually are, or how to mitigate them. Even if you break all of those things down into 5 year old bitesized chunks, you end up with boring word soup completely disconnected from the meaningful parts of the conversation.
Really what ELI5 is, is a technique to allow the asker to not have to look anything up. From the parent comment, you can look up "patch", "AMD", "microcode"; or you can demand "ELI5!" and have someone else type up long, careful definitions that don't reference context or words that a 5 year old doesn't know.
Regarding what microcode is, here is a good explanation of the differences between microcode and firmware:
I agree that many topics are hard to explain to a five year old, but ELI5 can be very helpful in forcing people to simplify their writing. Many people explain things in an unnecessarily complex way, and ELI5 at least makes them think about the target audience.
A modern generalist CPU is made of many smaller, simpler, specialized CPUs : there's a whole orchestra inside.
Amongst those smaller CPUs, there's a master : it'll see to decoding of instruction, sending jobs to the various CPU units, and fetching the results of said jobs. That master is running a program, executing ... microcode ! And of course, if there is a program, there are bugs. CPUs have bugs since CPUs were invented.
Microcode itself was present in early CPUs, (say, the Z80), but hardcoded. Nowadays, microcode can be uploaded to a CPU to fix bugs.
A Grandchild's Guide to Using Grandpa's Computer a.k.a. "If Dr. Zeuss were a Technical Writer" was written in 1994 and mentions microcode.
Microcode updates are always discussed when talking about microarchitectural security vulnerabilities (and other scary CPU errata like https://lkml.org/lkml/2023/3/8/976).
Microcode is always mentioned when discussing CPU design evolution.
It's funny that it's "always" mentioned, yet it's not familiar to me. Also curious the Wikipedia article for CPU design doesn't mention it, since it's "always" referenced.
Just because something is familiar to you, or even large swaths of a given population, doesn't mean everyone should be expected to know it.
I love learning new things. I love discovering topics I know nothing about, and I love picking the brains of those passionate about them. But the condescension from a certain type of tech nerd sucks all the fun out of learning. I've certainly been guilty of this in the past.
> It's funny that it's "always" mentioned, yet it's not familiar to me. Also curious the Wikipedia article for CPU design doesn't mention it, since it's "always" referenced.
you're not going to convince others that microcode is some kind of foreign concept to CPUs just because you yourself were unfamiliar.
Yes, it can be a downer to discover that you're more naive in a subject than you had previously thought you were more familiar.
>Also curious the Wikipedia article for CPU design doesn't mention it, since it's "always" referenced.
microcode is something that is implemented by CPUs that are too big and expensive to replace -- it's not something that is fundamental to processor designs. It's something we now live with to prevent things like the 'pentium bug' from costing Intel many-many dollars after a consumer-products forced recall/replacement.
At this point in history I think that if someone wants to consider themselves to be well-versed or knowledgeable about consumer CPUs then learning about microcode is a hard requirement. It's a false metaphor now to consider a CPU to be an unchanging entity, and that's important to at least be aware of -- it's literally one of the only ways that t
When did I say it's a foreign concept? I said it's not common knowledge for five year olds, and in reply, someone stated it's "always" mentioned. I was simply demonstrating that it's not "always" mentioned.
> At this point in history I think that if someone wants to consider themselves to be well-versed or knowledgeable about consumer CPUs then learning about microcode is a hard requirement.
This statement strikes me as hyperbolic. A CPU/hardware engineer, or even security-conscious software engineer, sure. But I can't understand why there is a reason for a consumer to care.
> AMD have released an microcode update for affected processors.
I don't think that is correct. AMD has released a microcode update[0] for family 17h models 0x31 and 0xa0, which corresponds to Rome, Castle Peak and Mendocino as per WikiChip [1].
So far, there seems to be no microcode update for Renoir, Grey Hawk, Lucienne, Matisse and Van Gogh. Fortunately, the newly released kernels can and do simply set the chicken bit for those. [2]
This technique is CVE-2023-20593 and it works on all Zen 2 class processors, which includes at least the following products:
AMD Ryzen 3000 Series Processors
AMD Ryzen PRO 3000 Series Processors
AMD Ryzen Threadripper 3000 Series Processors
AMD Ryzen 4000 Series Processors with Radeon Graphics
AMD Ryzen PRO 4000 Series Processors
AMD Ryzen 5000 Series Processors with Radeon Graphics
AMD Ryzen 7020 Series Processors with Radeon Graphics
AMD EPYC “Rome” Processors
Do they mean "only confirmed on Zen2", or is the problem definitely confined to only this architecture?
Is it likely that this same technique (or similar) also works on earlier (Zen/Zen+) or later (Zen3) cores, but they just haven't been able to demonstrate it yet?
I mean, the PS5 is running a Zen 2 processor [0] so I would assume it's vulnerable. In general I would assume that AAA games are safe. Websites and smaller games made by malefactors will be the issue. (Note that AAA game makers have little interest in antagonizing the audience, OTOH they also will push limits to install anti-cheat mechanisms. On balance I'd trust them.)
Interresting, could well be a path to jailbreaking the PS5... although, not sure if that has or hasn't already happened. For XBox Series, you can just use dev mode in the first place.
What valuable secrets do people have on their PS5/Xbox? You also need a way to deploy the malicious payload on those platforms which, due to their closed nature, is very difficult to do.
That's a good point but I can't believe that every console doesn't have it's own unique set of keys so that if you compromise one before SW patches land, it won't be much use in the ecosystem.
It depends. I'm going to speak in general terms, since I obviously don't know how every single system works, but per-console keys are used for pairing system storage to the motherboard and maybe keeping save data from being copied from user to user. Most CDNs don't really provide the option for on-the-fly per user encryption, so instead you serve up games encrypted with title keys and then issue each console a title key that's encrypted with a per-console key. Disc games need to be encrypted with keys that every system already has, otherwise you can't actually use the disc to play the game.
As for the value of being able to do 'hero attacks' on game consoles, let me point out that once you have a cleartext dump of a game, you've already done most of the work. The Xbox 360 was actually very well secured, to the point where it was easier to hack a disc drive to inject fake authentication data into a normal DVD-R than to actually hack a 360's CPU to run copied games. That's why we didn't have widely-accessible homebrew on that platform for the longest time. Furthermore, you can make emulators that just don't care about authenticating media (because why would they) and run cleartext games on those.
At least with the PS3, I seem to recall that I couldn't extract any of my games' save data from the hard-drive of my PS3 unit that went dead due to RROD (or was it YLOD?) because the hard-drive was encrypted using the PS3's serial key as part of the encryption.
I don't know if that mechanism persists into the PS4/PS5.
Oh, I can imagine lots of uses for a bevy of PS5's, assuming you can gain remote control. What do you do with a botnet? What do you do with a botnet with a pretty good GPU? What do you do with an always-on microphone in people's living rooms?
You are likely frequently running untrusted workloads. As javascript in a browser. I don't know about this one, but at least meltdown was fully exploitable from js.
The idea being that the main process and content processes should never be on the same core?
I would worry about cross site leakage. From my understanding that would be unavoidable as soon as you have more tabs open than cores, which feels like an unworkable restriction.
Imagine opening a 9th tab and bring told you need to upgrade your 3700X to a 3900X.
I think there's several levels. As a first step, I'd appreciate reducing the risk of javascript extracting contents from outside the browser. A second step could be to use more granular core scheduling within firefox, to prevent sharing cores that shouldn't be shared. A process/thread hierarchy can create multiple core scheduling groups.
It is a simple static HTML page, how is it possible in 2023 a static site could be hugged to death. In most cases HN traffic barely hits 100 page view per second.
I'm not really sure, cpu and bandwidth utilization were fine. Memory usage was high, but not oom high. It continued serving over http just fine, perhaps there was some automated rate limiting by my provider.
I'll have to debug when things cool down.
I'm aware 128M is ludicrous in 2023... "a fun challenge", I thought to myself. I can be a dummy.
A single core machine already overloaded is going to get even worse introducing the cpu overhead of gzipping response bodies (assuming it’s cpu bound and not IO bound)
Cache control headers will help with return traffic
More cpu cores
If using nginx ensure sendfile is enabled and workers are set to auto or tuned for your setup
Check ulimit file handle limits
Offload static assets to cdn
Since it’s a static html site, you could even host on s3, netlify, etc
> A single core machine already overloaded is going to get even worse introducing the cpu overhead of gzipping response bodies (assuming it’s cpu bound and not IO bound)
Unless your CPU is burning due to additional system calls being made.
> Actually running zlib on every request will only make it worse.
I wouldn't be so sure, given that without zlib HTTP connections take longer, thereby increasing the size of the wait queue and the number of parallel connections.
In my personal experience, the first step in tuning Apache was "put a nginx server in front of it". Running out of workers (either processes in the prefork model, or threads otherwise) was in my experience way too easy, especially when keepalive is enabled (even a couple of seconds of keepalive can be painful). The async model used by nginx can handle a lot more connections before running out of resources.
It's a security writeup so it's probably run by a security expert who is not an expert at running high traffic websites. Most likely there is something on the page that causes a database hit. Possibly the page content itself.
According to AMD's security bulletin, firmware updates for non-EPYC CPUs won't be released until the end of the year. What should users do until then, disable the chicken bit and take the performance hit?
Presumably classified as severity 'medium' in an attempt to look marginally less negligent when announcing that they can't be bothered to issue microcode updates for most CPU models until Nov or Dec.
Under what circumstances is this not a medium? The only case this applies is if you have public runners running completely untrusted code, and if you're doing that I hope you're doing it on EPYC, which is fixed. And if you're doing that, you're probably mining crypto for randoms.
> The attack can even be carried out remotely through JavaScript on a website, meaning that the attacker need not have physical access to the computer or server.
Now it reads:
> Currently the attack can only be executed by an attacker with an ability to execute native code on the affected machine. While there might be a possibility to execute this attack via the browser on the remote machine it hasn’t been yet demonstrated.
This is both as cool as it is scary. I managed to "exfiltrate" pieces of my Bitwarden password (could easily be reconstructed), ssh login password, and bank credentials in a minute of running from a 10MB sample.
Really lovely writeup. I liked the discussion of determining how can you tell if a randomly-generated program performed correctly. The obvious approach is to just run it on an "oracle" -- another processor or simulator -- and see if it behaves the same way. But if you're checking for microarchitectural effects with tight timing windows you can also write the same program with various stalls, fences, nops and so on -- things which shouldn't affect the output (for single-threaded code) but which will result in the CPU doing significantly different things microarchitecturally. That way the CPU can be its own oracle.
AMD have released an microcode update for affected processors. Your BIOS or Operating System vendor may already have an update available that includes it.
I don’t really understand how CPU microcode updates work. If I’m keeping Ubuntu up to date, will this just happen automatically?
Sort of weirds me out that my OS can just silently update my CPU - I didn’t realize I was giving it that level of control… I guess it’s good vs the alternative of no-one actually updating for exploits like his though.
The implication was that you could boot a malicious OS, then boot into a different OS with the same processor and get pwned. As other commenters mentioned, this mechanism doesn't create that risk because the update has to be applied each boot.
They mention perf issues for the workaround but they're notably absent from the microcode commentary.
I wonder what this is going to do to the new AMD hardware AWS is trying to roll out, which is supposed to be a substantial performance bump over the previous generation.
The way Spectre and Meltdown played out, you'll have to excuse me if I stand outside the blast radius while we figure out if there's a chapter 2, 3 or 4 to this story.
They've proven Zen 2 has this problem. They haven't proven no other AMD processors have it. A bunch of people looking to make names for themselves are probably busily testing every other AMD processor for a similar exploit.
> The way Spectre and Meltdown played out, you'll have to excuse me if I stand outside the blast radius while we figure out if there's a chapter 2, 3 or 4 to this story.
I am OOTL on this one, do you have some information you could share?
There has been a long trickle of similar bugs to Spectre/Meltdown coming out long after the initial bugs and "fixes" were published. (The early fixes were all, in some sense, incomplete.)
> It was challenging to get the details right, but I used this to teach my fuzzer to find interesting instruction sequences. This allowed me to discover features like merge optimization automatically, without any input from me!
What happens when AI can start fuzzing software? Seems like a golden opportunity for opsec folks.
On my Zen2 / Renoir based system the PoC exploit continues to work albeit slowly even after updating the microcode (linked from TFA) that has the fix for this issue. The wrmsr stops it fully in its track.
Edit: just realized it must have been that the initramfs image is not updated with the manually updated firmware in /lib/firmware.
Edit2: Updated the initramfs and even if the benchmark.sh fails, ./zenbleed -v2 still picks out and prints strings which doesn't happen with the wrmsr solution.
Why does disabling SMT not fully prevent this? I don't know the details of Zen 2 architecture, but register files are usually implemented as SRAM on the CPU-die itself. So unless the core is running SMT, I don't understand how another thread could be accessing the register file to write a secret.
Because unless you pin the threads to certain CPU cores (e.g. in Linux by using the taskset command, or in Windows by using the Set Affinity command in Task Manager), they are migrated very frequently between cores.
So even with SMT disabled, each core will execute sequentially many threads, switching every few milliseconds from one thread to another, and each context switch does not modify the hidden registers, it just restores the architecturally visible registers.
Pinning doesn’t help either, since there will always be more threads than cores. Scheduling all those threads and even blocking on IO will cause context switches.
I do not know how that is done in Windows, but in Linux it is possible to reserve some cores to be used only for the threads that you assign to them and for no other threads.
This is done frequently for high-performance applications.
The pigeonhole principle does not stipulate which hole the extra pigeons have to appear in. Only that at least one hole must have more than one pigeon. It does not stipulate that all holes have to have pigeons; you can have 999 empty pigeon holes, and then a hole that has 1001 pigeons in it. The pigeonhole principle doesn't care.
In Linux it's possible to stipulate that, for instance, core 7 can only be used by super secret process PID 1234. If you have 400 other threads, that means the other threads will have to compete for cores 0-6. And if super secret PID 1234 is idle and there are 12 threads that are marked for scheduling, then they get to just wait for cores 0-6 to become available while core 7 stands idle.
I watched a talk several years ago about a HFT firm that ... abused? this principle. They had a big ass monster of a machine. Four sockets, four CPUs with gobs of cores and gobs of cache on each one. But the only thing they cared about was the latency on their HFT trade sniping process. If they could reduce the latency of receiving interesting information to executing a trade on that interesting information from (making up numbers) 1.1ms to 0.9ms, that was potentially thousands, millions of dollars in profit.
So if CPU socket 0 has cores 0-15, CPU socket 1 has cores 16-31, CPU socket 2 has cores 32-47, CPU socket 3 has cores 48-63, they marked cores 17-31,33-47,49-63 to be usable by nothing. Those cores are permanently and forever idle. They will never execute a single instruction. Ever. Core 16 can be used by PID 12345 and only by PID 12345, core 32 can be used by PID 7362 and only PID 7362, and core 48 can be used by PID 8765 and only PID 8765. This ensured that all data and all instructions used by their super high priority HFT process can never, ever be evicted from the cache.
Apparently it made a notable improvement in latency and therefore profit.
That does not apply when some cores are reserved for manual thread assignment, because the scheduler no longer throws pigeons in those holes, but schedules threads only on the other cores.
It depends a bit on the exact details of the implementation, but there are several possibilities imaginable.
For example, a failed speculation of vzeroupper could result in it erroneously claiming a register by clearing the zero flag on the wrong register - which would mean that the previous data of that register is now suddenly available. If that register has not been touched since a context switch, it could leak data from another process.
The linked article has an animation which suggests that it clears the zero flag on the previously-used register - which indeed requires the victim to reuse the register in the small amount of time between it being marked as zero and the zero being cleared again.
However, the linked Github repo states:
> The undefined portion of our ymm register will contain random data from the register file. [..] Note that this is not a timing attack or a side channel, the full values can simply be read as fast as you can access them.
This suggests that it does indeed do something akin to clearing the zero flag of a random register.
That's not quite right. The attacker doss the vzeroupper rollback. Any registers in the physical file that haven't been overwritten can be exposed as a result, regardless of what the victim did.
I read the article and I really liked that the author tried to make it seem as simple as possible that you don't need a degree or a deep understanding of how CPUs work to understand the issue.
However, one thing that bothers me is that the author claims it's possible to retrieve private keys or root passwords by triggering a faulty revert from the instruction that resets the upper bits of a register. Where is the demo results? All I see is a small-enough gif that looks like the Matrix terminal text scrolling through. Is there any way (other than running the exploit program myself) to check the results and see that it actually leaked the root password and other information?
The exploit will leak more or less random data (data which was accessed recently by the CPU). You cannot target a specific part of the memory, but you can keep fetching data until you get something interesting.
so if I wanted to test if it leaks my root password, I should run the code and open a terminal and say, upgrade packages, or upgrade packages before running the exploit code?
The only way would be to let the thing run and say, pipe the output to grep looking for your password or something else you're looking for. SIMD instructions are used very often for parsing text so I wouldn't be surprised if sensitive passwords eventually get loaded into an YMM register and the exploit just so happened to dump that.
Nice writeup, very easy to read. This guy is a bonafide legend, the whole time, based on his writing style I thought it was someone new to the field explaining in detail because it was hard for him to get it as well, but I see his name at the end and recalled all the epic findings of his. Very humble/nice guy, even in person I hear (rare for well accomplished people in tech).
One thing I don't get though, if amd64 is affected, shouldn't the complementary instructions in X64/intel also be affected? Does intel move around RAT referenced values instead of just setting/unsetting the z-bit?
Can anyone explain the `wrmsr -a 0xc0011029 $(($(rdmsr -c 0xc0011029) | (1<<9)))`? It seems to help on my system, but I don't understand what it does, and I don't know how to unset it.
CPU designers know that some features are risky. Much like how web apps may often have "feature flags" that can be flipped on and off by operators in case a feature goes wrong, CPUs have "chicken bits" that control various performance enhancing tricks and exotic instructions. By flipping that bit you disable the optimization.
An msr is a "model specific register", a chicken bit can configure cpu features.
They don't persist across a reboot, so you can't break anything. You can undo what you just did without a reboot, just use `... & ~(1 << 9)` instead (unset the bit instead of set it).
No, it's all Zen 2 CPUs, which include both desktop CPUs (with or without integrated graphics, laptop CPUs, and server CPUs. The reason why the product list is so confusing is that AMD reuses architectures across generations. You'd think that all ryzen 5000 series CPUs have the same microarchitecture, but they don't). It's much easier to consult this list instead: https://en.wikipedia.org/wiki/Zen_2#Products
Wait... now there's also APU's under the AMD Athlon brand? I know that people are happy when AMD's product offerings are on-par or outperforming Intel, but they didn't have to outdo Intel in the consumer confusion arena as well.
Intel also used the Pentium branding for low-end processors (below i3 and in the Atom lineup), and followed it up with the rather perplexing move of using their company name as the sole branding for their worst products ("Intel Processor").
The only series 5000 cpu's that are still using Zen2 architecture are apparently the 5300U, 5500U and 5700U, which all use socket FP6 (mobile/embedded).
So I'm guessing it shouldn't affect any of the more recent and very popular Zen3 cpus like the 5600, 5700 etc. I personally own a 5600, which are a great bang for buck.
Lucienne (5700U/5500U/5300U) are the only Zen2s in the 5000 series at present (afaik), but AMD continues to re-use the Zen2 architecture in the 7000 series (7520U, etc), as well as many semicustom products like Steam Deck.
It's in rather a sweet-spot as far as performance-power-area, so this isn't entirely a bad thing. Zen3's main innovation was unifying the CCXs/caches, but if you only have a 4C, or you want to be able to power-gate a CCX (and its attendant IF links/caches) down entirely, Zen2 does that better, and it's slightly smaller. We'll be seeing Zen2 products for years to come, most likely.
VMs were and always have been a sort of faked out security blanket. The reality is there's shared hardware, shared memory, shared network interfaces, shared storage.
That and these days you can run so much on a single machine its really hard to understand the thinking that colo isn't and wasn't the best option. AWS isn't some mythical devops dream that never fails. It's a highly convoluted (in price as well as function) affair.
Off-topic question, but can some experts tell me why it is safe for `strlen()` and friends to use vector instructions when they can technically read out of bounds?
Essentially because memory mappings and RAM work at page granularity, rather than bytes. If a read from in-bounds in a page isn't going to fault, a read later in the same page isn't going to fault either (even if it is past the end of the particular object).
Implementations using 32- or 64-byte (256 or 512 bit) vector extensions would run afoul of 16-byte granularity. While it is not common yet, ARM SVE allows vector sizes larger than 128 bits -- e.g., Graviton3 has 256-bit SVE and Fujitsu A64FX has 512-bit. (x86 has had 256 and 512 bit vector instructions for some time, but current CHERI development seems to be on ARM.)
I think you might be confusing the tracking of validity of capabilities themselves (which could indeed be at a 16 byte granularity for an otherwise 64-bit system) with the bounds of a capability, which can be as small as 1 byte.
> If you can’t apply the update for some reason, there is a software workaround: you can set the chicken bit DE_CFG[9].
It reminds me of the compiler switches which can alter the way code at different levels (global, procedure, routine) can access variables declared at different levels and the change in scope that ensues.
Maybe some of this HW caching should be left to the coders.
I don't understand how a microcode update could fix this. I assume microcode is used for slow operations like triginometric functions, and doesn't affect how registers are allocated or renamed. Or does the update simply disables some optimizations using "chicken bits"? And by the way, is there a list of such bits?
Everything a modern CPU runs is microcode. There are a few x86 instructions that translate to a single microcode instruction, but most are translated to several.
The designers leave themselves an ability to override any instruction using the microcode so they can patch any instruction. They don't use the microcode only to implement complex instructions that require loops.
Every time I think it's insane to contemplate making one's own chips something like this comes up, and I realize it's insane to let so much of our lives depend on these insanely complicated devices. It feels like walking on a tightrope above an unbounded fractal chasm.
> AMD have released an microcode update for affected processors. Your BIOS or Operating System vendor may already have an update available that includes it.
Yes, I love flashing BIOS...
edit nvm, Microcode can get updated via system updates.
Microcode updates haven't been managed in-BIOS for over a decade now. If you use Linux, you'll usually see them released as some package like "intel-microcode" or "amd-microcode".
Even EFI updates rarely are very intrusive or dangerous, and can also be handled by the Operating System via an update.
They are managed both ways. I think updating in BIOS is preferable to ensure no CPU parameters change while (some part) of the kernel has already initialized.
But of course BIOS updates have many downsides and often stop after a few years.
To be fair, flashing the bios isn't nearly as bad on most modern systems.
Put the file on a USB drive, plug it in, restart and go into the bios, look for the flashing utility, select the file, done. As long as the machine is on a UPS in case of disaster, everything's accounted for.
From my experience: Better have a > 32gig USB Flash drive, everything else doesn't work (MSI) and I don't have an UPS so it's always quite an exciting experience. Especially since Motherboard Manifactures save almost a whole dollar by not having a display ouputting anything. So it's blinkenlights and hope for the best
Not just for convenience, but safety. You don't want to be caught out when something goes wrong, even without flashing the bios.
A lot of boards these days have 7-segment displays. They're not great, but they're a good step up. Don't need to spend a lot, I think they show up on $300-ish boards. Mine definitely does.
I see no need for it. Living in Germany, any kind of power outages are exceptionally rare. I remember one in the last 10 years for a few hours and that was very local. If I am in a situation where a power outage occurs, i'll listen to my battery radio for a while and be fine. I work on nothing and rely on nothing that would actually require a UPS.
It's like not having a backup drive. Everything is fine until one day it isn't.
A good UPS does more than just protect from outages. It also protects from surges and low-voltage situations that can both damage the equipment severely.
A UPS doesn't cost much and will last many years. Buying a new motherboard and GPU because they got fried is much more expensive.
Keep in mind that most equipment of this nature has universal power supplies, and thus works even at just 50% of nominal grid voltage.
The substation is mandated to trip in that anomaly, btw. Otherwise some very common types of motors (AC induction, single and 3-phase) would burn from the excessive current they draw to compensate for the reduced voltage.
My new computer takes a while to POST (z690 with ddr5 smh) so it’s basically been continuously either on or in sleep since I built it 18 months ago and I’ve had an unexpected shutdown due to power loss once in that time according to the Event log. I think the risk of losing power while flashing the bios is very small in real life unless you are stuck in a place with third world electricity infrastructure.
If POST takes a long time it’s often memory training, backing off on the timings just slightly might make it go a lot quicker. Bios updates also often twiddle knobs in this area.
Memory training can happen if the CPU detects that current timings don't work at boot. Since GP never shuts down, it's possible that his memory is always hot and performs better on a reboot (when setting new timings) than after a cold boot (this is basically always true since ram chips like to be hot but especially relevant here). If GP is using XMP or has custom timings, I'd suggest easing off on them especially considering the novelty of DDR5.
It isn’t this, it takes about a minute to train after a bios update or when I enable XMP but never trains after that. It just takes like 20-30 seconds to get all the way to the bios splash screen and only 5 seconds to return from sleep so I just use sleep instead of turning it off. Then the only time I need to wait through a boot is for windows updates.
It's worth noting that the FSF consider that supporting microcode updates makes software "un-free". This is a great example of what an inordinately stupid and harmful position that is.
The FSF would have no issues with this if the microcode was free software. I don't necessarily agree with the FSF position on microcode, but they aren't against the idea of updating microcode.
>We now know that basic operations like strlen, memcpy and strcmp will use the vector registers - so we can effectively spy on those operations happening anywhere on the system! It doesn’t matter if they’re happening in other virtual machines, sandboxes, containers, processes, whatever!
>This works because the register file is shared by everything on the same physical core. In fact, two hyperthreads even share the same physical register file.
>It turns out that mispredicting on purpose is difficult to optimize! It took a bit of work, but I found a variant that can leak about 30 kb per core, per second.
>This is fast enough to monitor encryption keys and passwords as users login!
It allows the attacker to eavesdrop on the data going through operations like strcmp(), memcpy(), and strlen(). (These are the standard functions in C for working with strings; and many higher-level languages use them under the hood.) It works on any function that uses the XMM/YMM/ZMM registers.
It's stochastic; the attacker randomly gets data from whatever happens to be using the XMM/YMM/ZMM registers at the time. So if the attacker could eavesdrop in the background constantly, they might eventually see a password. Or they might be able to trigger some system code that processes your password, then eavesdrop for the next few milliseconds.
The attacker needs to run code on your machine. Unclear if running code in a web browser is sufficient or not. It requires an unusual sequence of machine instructions, which isn't necessarily possible in JS/WASM, but 'sounds' says they did it: https://news.ycombinator.com/item?id=36849767
> If you remove the first word from the string "hello world", what should
> the result be? This is the story of how we discovered that the answer
> could be your root password!
I assume they meant "what does this do in normal vulnerability discussion terms", I don't know why tavis didn't just say "arbitrary memory read across processes" or whatever.
Beyond what everyone else said, these types of exploits can break out of VMs. Unless I'm misreading it you could log into your $5 linode/digitalocean/aws machine and start reading other people's data on the host machine.
There's tons of million dollar/month businesses on ~$20/month accounts on shared machines.
Some people think you need "the ability to execute arbitrary code in an unprivileged context" to perform this exploit. Which is of course a false assumption. The bug class in this case is basically a user-after-free, for a function which keeps its state per-cpu-core, for a function that is (for almost all intents and purposes) unprivileged.
From the article:
We now know that basic operations like strlen, memcpy and strcmp will use the vector registers -
so we can effectively spy on those operations happening anywhere on the system! It doesn’t matter
if they’re happening in other virtual machines, sandboxes, containers, processes, whatever!
All you need to do is write some JavaScript that will "trigger something called the XMM Register Merge Optimization2, followed by a register rename and a mispredicted vzeroupper". It's up to the hacker to determine how to do this explicitly in JS, but it's theoretically possible by literally any application at any time on any operating system. Even if some language or interpreter claims to prevent it, it's possible to find an exploit in that particular language/interpreter/etc to get it to happen.
This is how exploit development works; if you can't go straight ahead, go sideways. I guarantee you that someone will find a way, if they haven't yet.
I would like to think that the likelihood of being able to find a juicy target using one of these specific CPUs and who would have explicitly not updated their microcode for this exploit is much much much higher going after end users on the web than by attacking organized VPS providers.
> The attack can even be carried out remotely through JavaScript on a website, meaning that the attacker need not have physical access to the computer or server.
> Currently the attack can only be executed by an attacker with an ability to execute native code on the affected machine. While there might be a possibility to execute this attack via the browser on the remote machine it hasn’t been yet demonstrated.
We are on a tech site with highly intelligent individuals who have been programming computers since we've been in diapers.
If you don't believe the text then how would you believe the video? Anything can be done in devtools beforehand and I can think of a million different ways to fake the video.
Personally, if I didn't trust the text then an easily faked video wouldn't placate me either.
No, only the ability to execute arbitrary code in an unprivileged context. Would probably have to be arbitrary x86_64 instructions - Javascript wouldn't cut it for this one.
Summary of the blog post "Zenbleed" by Tavis Ormandy:
The blog post discusses the discovery of a vulnerability called "Zenbleed" in certain AMD Zen 2 processors. It revolves around the use of AVX2 and the vzeroupper instruction, which zeroes upper bits in vector registers (YMM) to avoid dependencies and stalls during superscalar execution.
The vulnerability arises from a misprediction involving vzeroupper, which can be exploited with precise scheduling and triggering the XMM Register Merge Optimization, leading to a use-after-free-like situation. This allows attackers to monitor operations using vector registers, potentially leaking sensitive information like encryption keys and passwords.
The author found the bug through fuzzing and developed a new approach called Oracle Serialization to detect CPU execution errors during testing. The vulnerability (CVE-2023-20593) affects various AMD Zen 2 processors, but AMD released a microcode update to address the issue. For systems unable to apply the update, a software workaround exists by setting the DE_CFG[9] "chicken bit."
The post concludes with acknowledgments to individuals who contributed to the discovery and analysis of the Zenbleed vulnerability.
Remember: just because this one bug gets fixed in microcode doesn't mean there's not another one of these waiting to be discovered. Many (most?) 0-days are known about by black-hats-for-hire well before they're made public.
CPU vulnerabilities found in the past few years: