Return to abort() – Using code introspection to prevent stack-smashing

ebeip90 · on July 11, 2017

This isn't very useful, and has some pretty obvious design flaws.

Given any routine which ends with the very common pattern:

    return foo();

Which is assembled to the very common sequence:

    call foo
    leave
    ret

This project will instrument it to now look like:

    call foo
    jmp $+2
    .byte DE, AD
    leave
    ret

Whatever return address I hijack, I can now just point it at this valid return site, and begin my ROP stack as per normal.

What's especially great is that the project guarantees this pattern for us. Now, every function has a path that looks like:

    call __stack_chk_fail
    jmp $+2
    .byte DE, AD
    < function frame cleanup >
    ret

This is effectively a no-op for security.

I cleaned up the author's code, added a sane makefile, and an example exploit here: https://github.com/zachriggle/return-to-abort

(Pull Request: https://github.com/cjdelisle/return-to-abort/pull/1)

tptacek · on July 10, 2017

If you get to instrument codegen to insert countermeasures, you can do more interesting things than this; for instance, you can do return address protection and explicitly cookie functions and their returns.

Also, check out FSan:

http://www.pcc.me.uk/~peter/acad/usenix14.pdf

comex · on July 11, 2017

For a similar approach, look at grsecurity’s RAP:

https://pax.grsecurity.net/docs/PaXTeam-H2HC15-RAP-RIP-ROP.p...

(Not that you’ll be able to actually use RAP, as they’re only releasing it to commercial customers, but the description of it is worth checking out.)

loeg · on July 10, 2017

So this limits the number of gadget options you have for ROP, but doesn't eliminate ROP entirely, right? It maybe increases the difficulty of ROP, if you can't find enough/sufficient gadgets that happen to start after function call sites. Not a silver bullet, anyway.

cjd · on July 10, 2017

Well, it means that modulo hash collisions, a function can only return to one of the places which calls that function, so in the really tragic case (for example) that someone called a vulnerable function and then immediately after called system() with a stack variable as the arg, the attacker can just return there and make the arg point to "bash". But in general the whole business of knitting together assembly instructions in executable memory would pretty much be gone. Edit: typo, clarity

loeg · on July 10, 2017

Is it really limited to only call sites of that function, or to all call sites? I can't tell if their return cookie is shared throughout the binary or unique to callees.

cjd · on July 10, 2017

One approach is to assign a random 2 byte number to each function and all callers to that function must follow the call with those 2 bytes (with a jmp 2 so it doesn't try to execute them). Unfortunately this would require the linker to get involved because we're not going to know these cookies at compile time.

Another approach is to take a hash of the types of the args and the return value (pointers obviously being opaque). This way we know the cookie value for any given function at compile time and we can stay out of the linker. However, in this case function a(int, char) can return to the call sight of function b(int, char) because to the code they're identical.

Ded7xSEoPKYNsDd · on July 11, 2017

The problem with per-function cookies are dynamic calls. The only feasible options I can think of is are either a) a secondary cookie that is allowed from all functions or b) a shadow stack with the cookies.

nullc · on July 11, 2017

that hash approach would let you replace one varargs function with a similar one... :(

though at least being forced to return to the start of a function instead of somewhere randomly in the middle seems pretty powerful to me.

benmmurphy · on July 10, 2017

i think all the ROPs i used in the last exploit i wrote were all not real instructions. these seem to be the most interesting.

jschwartzi · on July 10, 2017

I don't believe there are any silver bullets in security.

smegel · on July 10, 2017

Why isn't the return address stored in a register instead of on the stack to avoid this in the first place?

strcat · on July 10, 2017

It is stored in a register on some architectures like ARM. However, that register gets spilled to the stack to store another return address there when calling another level deeper. It doesn't change much. It does make it easier to implement return address control flow integrity that's not vulnerable to a race window between the CFI check and the return.

Animats · on July 11, 2017

There have been machines with a separate return address stack in on-chip hardware. Forth CPUs were built that way, as was a National Semiconductor part used for running embedded BASIC. Running out of return point stack was a problem, since those 1980s machines were transistor-limited and came with small return stack sizes.

comex · on July 11, 2017

PICs are still popular and have hardware return stacks.

Modern high-end CPUs have hardware return stacks too, but only as a hint to the branch predictor of where a ret instruction will jump to (return stack buffer).

Separately... there are exploit mitigations that create a separate stack just for return addresses, making them impossible to reach through stack buffer overflows. For a recent implementation, see Clang's SafeStack:

https://clang.llvm.org/docs/SafeStack.html

Or for a hardware-assisted version, there's Intel CET (not yet implemented on shipping CPUs, AFAIK):

https://software.intel.com/en-us/blogs/2016/06/09/intel-rele...

There are serious limitations to this approach, though: there's a lot of important data on the stack other than return addresses, and overwriting it is often enough for an attacker to redirect control flow eventually, just more indirectly.

cjd · on July 11, 2017

SafeStack is indeed very interesting, this is the only thing I see here which I consider to be fully superseding the idea of return-to-abort.

mnem · on July 10, 2017

It has to get put on the stack at some point so you can call more than 1 function deep. So why not always put it on the stack so that you don't waste a valuable register?

pm215 · on July 10, 2017

The answer to "why not always put it on the stack" is "because a lot of functions are leaf functions and so always writing it to the stack is making every function pay the memory access hit rather than just the ones that need it". RISC-ish architectures tend to have enough registers that dedicating one to a link pointer isn't a big deal (and once you do spill it to the stack you can use the link register as a temporary register anyway).

Some very early CPU architectures didn't actually support either putting the return address in a register or on the stack. For instance, on the PDP-8 (https://en.wikipedia.org/wiki/PDP-8#Subroutines) the JMS instruction writes the return address to the first word of the subroutine it's about to call (and the actual subroutine entry point is just after that), which meant it didn't conveniently support recursion. It wasn't alone in that either -- I think that it just wasn't quite appreciated how important recursion/reentrancy was back in the early 60s when these ISAs were designed.

tptacek · on July 10, 2017

Sure, and SPARC has register windows, but also still has control-flow integrity attacks; overflows are just as bad there.

amluto · on July 11, 2017

Registers aren't all that valuable on architectures with reasonable numbers of them, and a lot of architectures do "branch and link" instead of an x86-style call. Branch and link generally means that control flow jumps elsewhere and the address of the next instruction is stored in a register. You jump back to that register to return. Functions are responsible for saving the link register if they clobber it.

This has at least one benefit over x86-style calls: a function like this:

  void foo(void)
  {
      for (int i = 0; i < 10; i++)
          some_leaf_function();
  }

has to save its own return address to the stack, but it only needs to save it once, so all ten leaf calls can happen without stack access for the return address.

Of course, architectures like x86 have specialized hardware to optimize calls, so it's probably a wash in the end.