OpenBSD's kernel gets W^X treatment

SwellJoe · on Jan 14, 2015

Does anyone have links that would explain this concept in greater detail? I get the meaning of W^X and I even understand the danger of executable memory, and why one wouldn't want it to be write-able (or at least not write-able by untrusted writers), but, beyond that I don't really understand the broader implications.

Also, it sounds like it would be a massive undertaking, even for a small(ish) kernel like OpenBSD, if the kernel wasn't always written with this goal in mind. Is that the case? (I don't see a patch referenced, so I'm not able to judge for myself.)

brynet · on Jan 14, 2015

OpenBSD's kernel virtual/physical memory system, uvm(9), already had the ability to set permissions on mappings. But the recent work done here is still substantial, API's were cleaned up and the W^X policy was applied in the places where memory was being mapped with both permissions (writable, executable).

I'm not overly familiar with all that went into this, but reading the commit logs by both Theo and Mike might help clarify some of it.

http://freshbsd.org/search?project=openbsd&q=file.name%3Aamd...

http://freshbsd.org/search?project=openbsd&q=file.name%3Aker...

http://freshbsd.org/search?project=openbsd&q=file.name%3Aamd...

http://freshbsd.org/search?project=openbsd&q=file.name%3Aker...

koenigdavidmj · on Jan 14, 2015

Stack smashing attacks are no longer possible, since:

1. ProPolice detects most attempts to clobber the return address.

2. You can't set the return address to memory you control the contents of, such as user input, since that memory is writable and therefore not executable.

3. The remaining way to get code into executable memory is to write a file and have the program mmap(2) it. Address space layout randomisation makes finding this code difficult, even if you have the ability to smash the stack in a way that bypasses ProPolice.

The difficult cases (like the trampoline case mentioned) are usually problematic because they are programmatically writing small functions in machine code, then executing them; basically this requires the discipline to write the function and then immediately flip the page from writable to executable. Implementing a JIT compiler like the JVM would encounter similar difficulties.

yuubi · on Jan 14, 2015

> no longer possible

More difficult, but https://en.wikipedia.org/wiki/Return-oriented_programming is a thing.

brynet · on Jan 14, 2015

OpenBSD has switched it's platforms to using PIE (Position-independent Executables) by default. The 5.7 release will also introduce self-relocating static PIE.

http://marc.info/?l=openbsd-cvs&m=141922027318727&w=2

chongli · on Jan 14, 2015

A cursory Googling turned up this:

http://flyer.sis.smu.edu.sg/trustcom11.pdf

I don't know whether it's practical or not but it's definitely possible.

comex · on Jan 14, 2015

I think you're confusing the kernel and userland, as the post mentions that OpenBSD's kernel ASLR support (i.e. position independence) is currently limited. When it comes to userland, W^X is a much older feature, which OpenBSD pioneered, but which by now is essentially ubiquitous (except when a JIT is in use, e.g. in web browsers), along with ASLR.

Such features have helped make exploits more difficult over time, but far from impossible - it all depends on the type of vulnerability, as well as things like how much interactivity exists between the attacker/the attacker's code and the target (potentially allowing em to gather data about ASLR, stack canaries, etc. before sending the final code execution bit). For example, web browsers are a very good case for the attacker, where not only is there a lot of interactivity in the form of JavaScript method calls, but a JIT usually ensures RWX pages exist; on the other side, an inetd server that spawns a new process for every request, with new ASLR offsets and stack canaries, would be pretty bad, since there is little interactivity.

When it comes to the kernel, an important attack source is userland programs (already compromised or run by a malicious user in a multiuser system) trying to abuse the system call interface. In this case, not only is there a lot of interactivity (many system calls + complex low-level device drivers, if applicable + weird CPU features + high level of control over multiple cores/threads and timing + sharing the same CPU caches etc. with the kernel), on pre-Haswell x86-64 processors, there is actually no performant way for the kernel to prevent the memory of the currently running user process from being directly accessible from it (not executable as of Ivy Bridge though), making any kind of ASLR much less useful. So while kernels can and do get pretty far by having well-written code that avoids vulnerabilities, they usually only need to give an inch for userland to take a mile. There are, however, other, less favorable attack scenarios, e.g. remote attacks on network stacks, and in any case W^X can't hurt.

pkaye · on Jan 14, 2015

It comes from the early x86 processors not having an "no-execute" permission bit for page table entries. In the AMD64 bit architecture, an NX bit was added.

danieldk · on Jan 14, 2015

And Intel retroactively added it to x86 (requiring PAE). E.g.:

http://ark.intel.com/products/27460/Intel-Pentium-4-Processo...

brynet · on Jan 14, 2015

Mike Larkin made a follow-up post to the mailing lists mentioning some upcoming work.

http://marc.info/?l=openbsd-tech&m=142122093110713&w=2

OpenBSD/i386 still has to run on systems without PAE, but it should also take advantage of the capabilities of modern processors.

geofft · on Jan 14, 2015

It's fairly rare that you want to write and modify a page somewhat simultaneously, as a high-level goal. Most of the time you're loading some existing executable from disk, and once it's loaded you're executing it, so you can just have the kernel write to it, switch it to executable and nonwritable, and pass it back to the userspace process. Sometimes you have a JIT or a VM doing binary translation, but even there you typically JIT some code once and then call it one or more times, and if you JIT some other code it's a different page. You don't typically intersperse writing code to a page and executing that page. So you just want userspace to be a little rigorous about separating those two steps, and tell the kernel when it's switching between those two, instead of requesting a page that's simultaneously writable and executable.

As another commenter said, one of the reasons this isn't done at the outset is that some architectures (notably x86-32) don't implement W^X in hardware, so there's no pressure to be 100% clean about this. But it's rare that you have code that isn't straightforward, at least conceptually, to rework into W^X compatibility.

One of the more annoying things is fixing up relocations: if you have a call to a function in a dynamic library, and you don't know where that library is going to be loaded until it's loaded, the most obvious way to implement this is to map your program code writable and fill in addresses once you know what they are. Each place where an address needs to be filled in is called a "relocation". So when you load a dynamic library, you loop over all relocations and fill in any addresses for symbols contained in that dynamic library.

There are lots of reasons this is awful; one is that you have to go update every place in the program code. So you indirect that through a thing called the "procedure linkage table" (PLT), which contains a bunch of tiny functions that just go call your real dynamic functions, and you hard-code references to the PLT. The PLT still has to be writable, though. If you don't want that, you make a separate section of the program called the "global offset table" (GOT) that contains addresses, and you have each stub function in the PLT do an indirect function calls to a matching entry in the GOT. So the PLT is executable and doesn't need to be writable, and the GOT is writable and doesn't need to be executable.

If you want to be super paranoid, you resolve all your dynamic libraries at startup and mark the GOT as read-only ("bind now" and read-only relocations aka "relro", respectively), so that nothing is writable. But that's a different discussion from W^X.

(This telling is not very historically accurate about how the PLT and GOT came to exist, but hopefully the explanation of what they do is close enough to correct to convey the general ideas.)

hlieberman · on Jan 14, 2015

Correct me if I'm wrong, but isn't amd64 kernels always going to have this implemented in hardware? I mean, W^X is a replication of the NX bit, which is (as far as I know) mandatory for the x86 instruction set.

achernya · on Jan 14, 2015

You're correct, the protection is implemented in hardware, but the pages have to be marked appropriately. This message describes a patchset that correctly marks the kernel pages as writable xor executable.

k__ · on Jan 14, 2015

So they can but don't have to be marked?

taejo · on Jan 14, 2015

Yes. Many programs rely on W&X (I believe the JVM is a prominent example)

masklinn · on Jan 14, 2015

Any JIT-based system requires executable writable memory. This includes the JVM, LuaJIT, Pypy, HHVM and all modern browsers.

m_eiman · on Jan 14, 2015

It'd be quite possible for a JIT to have the memory first writable but not executable when creating the code, then the other way around when running it. No need to be both at the same time.

geofft · on Jan 14, 2015

As I understand it, the NX bit is just a new permission bit to say "Don't execute this page". The i386 architecture only had read and write bits, and assumed that read also meant execute.

W^X is a policy that a kernel can choose to implement, that if the W bit is set on a page table entry, so is the NX bit. You need the NX bit to be available in hardware for this to be useful, but hardware support for NX doesn't mean that you have to use it, let alone implement W^X.

This means that amd64 processors are backwards-compatible with kernel and userspace designs that require W|X, even in long (64-bit) mode.

mrweasel · on Jan 14, 2015

I don't think the first versions of Intels AMD64 chips had the NX bit, but those chips are so old at this point it might be unimportant.

haberman · on Jan 14, 2015

Not a lot of context here: I take it this means mapping pages as writable or executable, but never both? And this is being applied to the kernel itself and the pages it maps into kernel space?

brynet · on Jan 14, 2015

That's right. W^X is a policy that memory is either writable or executable, but not both. OpenBSD uses this model in userspace, now it's being taken a step further and applied to kernel space.

abecedarius · on Jan 14, 2015

I assume the policy is W nand X rather than W xor X. First I've heard the term, so I'm risking pedantry to make completely sure.

ghswa · on Jan 14, 2015

According to [1] it's W xor X. Interpreting that literally suggests that read only memory is disallowed as well. I'd be surprised if that's actually the case.

[1] http://www.openbsd.org/33.html

friendzis · on Jan 14, 2015

Well, I don't believe memory can be unreadable, therefore neither writable nor executable memory is by definition read only

userbinator · on Jan 14, 2015

There is much mention of JITs and the workaround being to switch page permissions, but here is an example of an SMC pattern that W^X would really not work with; a function that does something the first time it is called, and collapses into a single RETurn instruction thereafter:

    once:
        mov byte [once], 195
        ; ...do something here...
        ret

I have used this technique in applications-level code, where it is significantly more efficient (both smaller and faster) than the alternatives when this "once" function will be called many times. I think it is always important to remember that while W^X and other restrictions have security benefits, they also have downsides in limiting some interesting creativity and the potential to exploit the full abilities of the machine.

Sanddancer · on Jan 14, 2015

First thought offhand is to have the code be writeable and executable the first time through -- W^X allows for this for JITs and the like -- and then set the page the code lives on to be executable only after this instance. Alternatively, have the once function call a function in memory space that's already set to execute only, to minimize the space attackers can perform shenanigans in.

However, this does kinda ignore one of the big focuses of the OpenBSD project. They tend to shy away from such clever hacks in the name of readability and auditability. While it's definitely a neat way of ensuring your code is only executing once, it becomes a hassle when you have to port it to other platforms. Keep in mind that OpenBSD ports to as many platforms as possible because the subtle quirks of various platforms will often tickle out rare bugs to become more repeatable. In this case, your replacing the once function with a return is dependent on x86, so wouldn't work on the many other platforms that OpenBSD runs in.

carterschonwald · on Jan 14, 2015

This is a pretty neat idea and interesting change! I wonder how having a security model where each page of memory can only be one of writeable/executable impacts JITs though? (I guess thats perhaps why jits often have those landing padd spots at the top of function/methods?)

haberman · on Jan 14, 2015

JITs are fine as long as you can switch a page from being rw to rx. At least that's my experience writing a JIT in userspace.

carterschonwald · on Jan 14, 2015

fair enough, i guess i'm just imagining challenges around multi-threaded jits, though i guess those are relatively less common overall.

haberman · on Jan 14, 2015

You can have multiple threads generating code if necessary, you just need to ensure that each has its own pages to write to. Once the machine code is written the page can be flipped to rx and it's safe to share across multiple threads.

brynet · on Jan 14, 2015

For userland, JITs continue work because you can still explicitly request mappings that are both writable and executable. W^X is a default policy.

nbe · on Jan 14, 2015

Does anyone know what 'MI' and 'MD layer' refer to in Theo's message ? Machine Indepedent and Machine Dependent maybe ?

VLM · on Jan 14, 2015

Exactly yes. A *BSD thing not strictly just openbsd or whatever.

Perhaps it would help to see them "in action" here is a link to the linux emulation layer in freebsd, the MD chapter four specifically describes i386 (this is how you put syscall parameters on, and off, the stack on a i386) and the MI chapter five is a pile of structs that would be used by any emulation layer (NPTL, TLS, the joy of futex'es's (a linux thing that is kind of a mutex cache for speed, sorta kinda), and good luck with the ioctls).

https://www.freebsd.org/doc/en/articles/linux-emulation/inde...

nbe · on Jan 14, 2015

Very interesting link, thanks.

zx2c4 · on Jan 14, 2015

I'd like to see the OpenBSD kernel have Grsec's UDEREF and SMEP/SMAP support.

brynet · on Jan 14, 2015

OpenBSD/amd64 does have SMEP/SMAP support, this was committed by Jonathon Gray (jsg@) in 2012, using QEMU. It's still hard to come by in actual hardware.

http://freshbsd.org/search?project=openbsd&q=SMEP&committer=...

http://freshbsd.org/search?project=openbsd&q=SMAP&ommitter=j...

caf · on Jan 14, 2015

I believe SMAP makes UDEREF obsolete, at least on processors that support it.

Hello71 · on Jan 14, 2015

https://forums.grsecurity.net/viewtopic.php?f=7&t=3046

Aissen · on Jan 14, 2015

So this means you can't have features like ftrace (and kpatch), BPF, or a kernel that re-configures itself at runtime like x86 Linux does at boot once it detects the hardware features. Of course you can work around all that by switching the page W/X bits as appropriate, but it's a bit more complex.

Otherwise it's a seriously impressive feat.

vezzy-fnord · on Jan 14, 2015

OpenBSD does support BPF, but as a character device.

Aissen · on Jan 15, 2015

I'm not sure I see the link between it being a character device and the fact that memory pages are W^X. But you are correct that I was wrong when saying you can't have it; I also said it was more complex since you have to be careful when switch a page from W to X.