Hacker News new | past | comments | ask | show | jobs | submit login
Syscall Call-From Verification (marc.info)
69 points by cnst on Nov 27, 2019 | hide | past | favorite | 23 comments



My only question: does making Go use libc stubs require Cgo? One property of Go that is really nice is that you can cross compile effortlessly without needing to build a cross compilation toolchain. Cgo negates this because it requires a C toolchain for the target...


IIRC macOS only supports syscalls via dynamically linked libc (libsystem)—there's no guarantee of a stable ABI for direct syscalls. Assuming go uses libsystem on macOS, is should be able to do something similar for BSD.


I guess it actually should be doable. In theory, all it has to do is link to the correct symbols in libc.so (or the respective library.) That doesn't seem like it would require a C compiler, so as long as the ABI is guaranteed to be stable, and the Go linker is capable of handling it.

Though one unfortunate aspect of this is even despite that, once they DO flip the switch, all old Go binaries will stop working.

(Though really, Go binaries don't have a long shelf life, because you have to recompile them when new security patches are released anyway. So maybe that is a non-issue.)


OpenBSD binaries in general don't have a long shelf life. They will break ABI when they want to move forward.


The description says to program to the API instead of the ABI. I'm not very familiar with low level work and exploits, but aren't syscalls the API for the kernel itself?

As far as I'm able to understand, it seems like the proposed mitigation blocks access to the kernel unless your code is either preauthorized (msyscall) or goes through a layer of indirection (libc) that undergoes randomized re-linking at boot. That seems to make sense to me, since it significantly reduces an adversary's knowledge about internals they are presumably targeting.

A few questions:

* Is it possible to authorize arbitrary code, or is access to msyscall (via libc or otherwise) restricted outside of boot?

* It seems that the kernel itself also undergoes randomized re-linking at boot (https://www.openbsd.org/innovations.html). So what does forcing everything through libc gain us?

* Is equivalent hardening likely to make it to the Linux world in the foreseeable future?

* What sort of attacks is this likely to prevent in practice?

* What have I misunderstood, and would someone mind explaining?


> I'm not very familiar with low level work and exploits, but aren't syscalls the API for the kernel itself?

No. The kernel doesn't call syscalls itself. It provides them for applications.

Just to clarify a potential misunderstanding. The kernel and userspace run at different "hardware" privilege levels (ring0<=>ring3). Userspace applications cannot simply call the kernel API directly, they must "trap" into the kernel to perform some pre-defined service request, aka. System calls.

There are usually a few hundred or more of these, like e.g. read(2)/write(2). They are also generally exposed via libc wrappers (but not always) or via the indirect syscall(3) function. But it is also technically possible to directly encode the trap instruction, for example on x86 "int $0x80" or amd64 "syscall" with the desired syscall index. This is the part OpenBSD is proposing to lock down. The low-level syscall ABI has always been unstable, meaning it can be changed incompatibility, and libc is the more-stable interface applications should be using.

With that out of the way..

> Is it possible to authorize arbitrary code, or is access to msyscall (via libc or otherwise) restricted outside of boot?

msyscall(2) can only be called /once/ per-process, e.g. by the ld.so dynamic linker. This is used to indicate where libc is mapped (libraries are mapped randomly) to the kernel so it can permit that area to make syscalls.


> Is it possible to authorize arbitrary code, or is access to msyscall (via libc or otherwise) restricted outside of boot?

It looks like there are restrictions if the binary is statically-linked, but I didn’t see anything else. You could probably pledge out of it.

> It seems that the kernel itself also undergoes randomized re-linking at boot (https://www.openbsd.org/innovations.html). So what does forcing everything through libc gain us?

This doesn’t protect against attacks on the kernel (which randomizing the kernel object files protects against); exploits at this stage are trying to spawn a shell (or similar) in userspace. Going through libc means you need to 1. find it and 2. pass whatever checks it has in place to get it to make the syscall.

> Is equivalent hardening likely to make it to the Linux world in the foreseeable future?

Parts of it, maybe. But turning this on by default would break applications.

> What sort of attacks is this likely to prevent in practice?

JIT code that has maliciously crafted to call execve(“/bin/sh”, …)


syscall numbers are constant (part of the ABI!) regardless of the kernel's address space layout.

Conversely, addresses of libc functions or any other code in the program are randomized at load time, so are unpredictable to an attacker.


I don't usually think of integer constants passed as an argument to a function as an ABI, but given that they're hard coded at kernel compile time, fair enough.

What I'm struggling to understand is why an additional layer of indirection is required to facilitate randomization in this case. It seems I must have a fundamental misunderstanding of how some part of the system works at this low level.

I'm also wondering what (if anything) is being lost to this mitigation - the syscall(2) manpage (http://man7.org/linux/man-pages/man2/syscall.2.html) seems to imply that not all system calls necessarily have matching wrapper functions in the platform's C library.


> What I'm struggling to understand is why an additional layer of indirection is required to facilitate randomization in this case.

You can't randomize the integer constants without rebuilding the kernel and everything that depends on those constants. It would be very inconvenient, and the amount of randomization would likely be limited by the number of syscalls (because you want to pack them tight in a lookup table instead of having sparse numbers that are expensive to look up). If the number of syscalls is known, randomization can at best make you call the wrong one.

By contrast, address randomization means the program will probably crash (instead of executing some random syscall) unless it can figure out the address of the function it wants. Randomizing addresses can be done on the fly in the runtime linker, without rebuilding binaries. Additional randomness can be introduced by relinking binaries with a randomizing linker, which is much simpler than running a full build.

> I'm also wondering what (if anything) is being lost to this mitigation - the syscall(2) manpage (http://man7.org/linux/man-pages/man2/syscall.2.html) seems to imply that not all system calls necessarily have matching wrapper functions in the platform's C library.

syscall(2) lives in libc.


After a bit more reading, it seems I hadn't realized just how low level direct system calls actually are (https://en.wikibooks.org/wiki/X86_Assembly/Interfacing_with_...). Apparently it's either a software interrupt or dedicated instruction, with the constant in a register. ASLR requires more abstraction than that (obviously), and there's no point to it if an exploit can just use the front door instead.


I don't know where you draw your lines, but a syscall (on Linux and amd64, but it should work pretty much the same on BSD and other architectures) looks like

    mov rax, <syscall number>
    ; Depending on the syscall, it expects its parameters
    ; in various other registers
    syscall  ; this is an actual instruction
So yes, it is very much part of the ABI. It selects which function to call.

The extra indirection, AFAIK, is needed because you couldn't just call a kernel function from userspace and have it run with kernel privileges. That's what the syscall instruction is for, to put the processor back in ring 0 so the syscall handler runs in kernel space.

As for what's lost, well, for example, a sane language-agnostic kernel ABI is lost to a C-centric libc API. I don't know if the syscall wrapper support in libc is as incomplete on BSD as it is on Linux, but they could add the missing wrappers, of course.

A cleaner approach might have been to have a separate libsyscall that only wraps syscalls and is mapped into the "blessed" address space that is allowed to issue syscalls. But then libc would have to wrap the wrappers, and they probably didn't want that extra indirection.


You are referencing a Linux manpage. But if it's also true for OpenBSD those syscalls will get deleted or a libc wrapper will be introduced.


Does OpenBSD stabilize the system call API?


Not like Linux, it is stable for a few releases but not forever in the BSDs, for openbsd I think its fairly short.


Damn, OpenBSD is good.


From my dupe submission: https://news.ycombinator.com/item?id=21653796

This new proposed mitigation builds upon other work, such as libc/ld.so random re-linking at boot, and opportunistic enforcement of syscalls from only un-writable pages by default.

https://www.openbsd.org/innovations.html


Looks like this may break some JIT's? Or is the workaround to have the JIT'ed code call libc instead of making direct syscalls?


OpenBSD already prevents syscalls from writable pages. It did not break JITs, in fact, to quote Theo de Raadt, it brings over an aspect of W^X to JITs that haven't been adapted to work with mandatory W^X.

https://marc.info/?l=openbsd-cvs&m=155942895309114&w=2

This means it's forced to use the libc stubs for syscalls. Any direct syscall from a JIT is suspect.


Hmm. Do any JITs actually emit syscalls directly? I would’ve guessed most JITs won’t even emit direct libc calls directly, since there tends to be layers of abstraction between it and the software.


QEMU does for its user-mode-only code, but not in JITted code. We have a little shim that directly executes a syscall instruction because we want to be able to wind the PC forward/backward across it to avoid a race condition with a signal arriving after the guest thinks it's made the syscall but before we've made the syscall in the host. But that's currently only done by linux-user; bsd-user should ideally have a similar mechanism but is basically dead/dying for lack of maintainers who care about the BSDs. In any case, that falls under "in the binary's text segment" rather than JIT output.

(code at https://git.qemu.org/?p=qemu.git;a=blob;f=linux-user/host/x8... )


Indeed. This appears to be the case for stuff encountered thus far.


Defense in depth, this is not unbreakable, but it adds another layer of difficulty an attacker must bridge.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: