Can someone explain how this works? What would trigger a misbranch and how can the OS tell the difference? I read the page and the link but I don't understand.
As a ROP mitigation measure some recent CPU models have a feature that can require code to branch to a special instruction signifying a valid target; jumps to an address that doesn't contain that instruction result in a CPU fault. On OpenBSD such a fault is translated to SIGILL, apparently, which can also occur when the CPU encounters other issues, like an invalidly encoded instruction. This patch differentiates the new type of fault so it's easier to identify within process signal handlers by adding a new reason code, ILL_BTCFI, to SIGILL info (siginfo_t).
I'm not sure if this patch is noteworthy on its own, but perhaps it was posted because it implies this CPU feature is now supported or soon to be supported by OpenBSD.
> I'm not sure if this patch is noteworthy on its own, but perhaps it was posted because it implies this CPU feature is now supported or soon to be supported by OpenBSD.
OpenBSD has supported mandatory BTI/IBT on Intel CPUs (11th Gen+) and arm64 (Apple M2) since 7.4 (Oct 16, 2023).
OpenBSD recently disabled retpolines by default because they are incompatible with IBT, so there has been some work done recently that this patch could help with, e.g: adding missing indirect branch target instructions (endbr64), previously hidden by retpoline. OpenBSD has done a lot of work to push these changes into the entire software ecosystem.
For more context, CET and friends don't actually act as a ROP mitigation in it of themselves. CET is a coarse forward edge CFI technique. For completeness, you must pair it with both some software function type assert (prevent argument type confusion by swapping function pointers, etc.) as well as a backwards edge CFI mitigation (such as secure a shadow stack for storing link addresses). Both are quite hard problems.
ARM's Pointer Authentication extension is an interesting alternative which does all of this in one feature by cryptographically signing pointers in memory.
CET is both Indirect Branch Tracking and Shadow Stacks; IBT is the coarse towards CFI, but SSs are a direct ROP mitigation, but enforcing that any address returned to must have been CALL'd from.
Protecting JIT-compiled code, e.g. within browsers, is at least as important a concern. Some would argue much more important, especially as usage of services like Cloudflare Workers increases.
Techniques like CHERI could replace MMUs in some environments, potentially improving performance.
There's no way to express an out of bounds memory read/write in JavaScript. So you don't actually need an MMU, or CHERI, or pointer signing, or any of the stack smashing mitigations put in place for C.
You do have to generate correct code but that's actually a solvable problem, web assembly did it.
In theory this could unlock tons of performance (and free up some silicon) if we could ever get the rest of the system away from C
> As a ROP mitigation measure some recent CPU models have a feature that can require code to branch to a special instruction signifying a valid target; jumps to an address that doesn't contain that instruction result in a CPU fault.
How much of a performance flow mitigation is this, 3-5%?
Compiler-based CFI has low single digits impact. Hardware with ENDBR will be hard to notice, "none" being a reasonably accurate estimate of the impact, unless your code is weird and unusually dense with valid indirect branch targets.
in fast paths almost all the code you're running is in icache, and even the decoded micro-ops are probably cached, so I bet it's even cheaper than that.
The article mentions both 64-bit ARM and x86. Intel's ENDBR64 (part of CET) and ARMv8.5A BTI work largely the same. The instructions are NOP on older processors or if not enabled by the OS.
BTW. A similar extension is also about to be approved for RISC-V: "Zicfilp". It also repurposes an instruction that was previously a NOP.
So, ROP itself is a (frequently very effective) workaround for W^X memory protection features, where executable code itself should be unmodifiable at runtime. I wonder if there's a "next" CFI attack that calls existing functions that are actually defined as such, but in a weird order, or with weird arguments.
The ability to do that effectively would depend on lots of stuff, including the calling convention (as the stack or heap are easier to corrupt than registers).
Maybe someone is already writing an advanced ROP gadget-search that looks for "real functions that can be combined to make ROP gadgets by calling them in a bizarre way". (This is already much less likely to occur in a real program ... I think ...! But maybe the return-to-libc phenomenon is still a rich source of vulnerability?)
Interestingly, RISC-V's Zicfilp proposal includes an optional, 20-bit label operand so that call sites and targets can be paired more strictly.
I'm not sure how or if it addresses trampolines and similar thunks that interpose callee and caller, but which don't necessarily touch arguments. Such thunks are quite common in dynamically linked code as ELF and Mach-O do lazy loading by default--dynamic function symbols are initially small thunks that load the dependency and then forward the call, restoring registers and the stack without having to know the call signature.