Task_t considered harmful

0x0 · on Oct 26, 2016

Copying my comment from the earlier submission that didn't gain much traction here:

What an absolutely amazing tour-de-force of a devastating design flaw in all versions of macOS and iOS and tvOS and watchOS!

The negotiations detailed in the bug report timeline about meetings between "senior apple and google leadership" for keeping this secret past the general deadline really underlines that.

blinkingled · on Oct 26, 2016

Yeah - the failed mitigations followed by a "long term" fix was interesting as well. Apple literally had to change execve() this late in the OS's development cycle to allocate new task and thread structs(that's two extra allocations and copies in hot path!) to fix it for good. That this design problem lingered around for so long doesn't look good for Apple - it's one thing for a use after free bug in obscure piece of code to linger but bad design affecting a ton of their own frequently used code to stand so long is somewhat brown bag category!

I wonder what other fallout we might experience from this.

blumentopf · on Oct 26, 2016

Business as usual with macOS. The other day I was browsing the ocspd source code. Turns out it calls openssl using system(). So openssl is officially deprecated on macOS and yet they're using it internally to handle certificates?! And there's an enlightening comment:

    /* Given a path to a DER-encoded CRL file and a path to a PEM-encoded
     * CA issuers file, use OpenSSL to validate the CRL. This is a hack,
     * necessitated by performance issues with inserting extremely large
     * numbers of CRL entries into a CSSM DB (see <rdar://8934440>).

http://opensource.apple.com/source/security_ocspd/security_o...

ocspd was introduced with 10.4. A decade ago. And that's really the problem with macOS: There's no refactoring of old hacks, but rather just bolting on of ever more new stuff.

zimpenfish · on Oct 26, 2016

Linking to openssl was deprecated (lack of binary stability) - not the command line tools.

mikeash · on Oct 26, 2016

I don't see a major problem with using the openssl command for this, but using system() to do it is completely insane.

wyager · on Oct 26, 2016

Apple needs to take a bit of those tens of billions of dollars they have sitting around and spend it on starting from scratch with something that's not horrifically crufty. The quality of their software is lagging so far behind the quality of their hardware right now. Realistically, I think we may just be at the point where operating systems and all the stuff the companies put on top of them are too complicated to keep developing in the traditional way with traditional tools. Formal verification might be the cheapest way forward at this point.

tptacek · on Oct 26, 2016

So far as the current state of the art in computer engineering goes, we don't know how to completely rewrite a system as complicated as XNU without creating fresh batches of implementation errors. So this is a little like suggesting Apple use its hundreds of billions of dollars to build an iPhone battery that only needs to be recharged once a month.

We may someday get an XNU rewrite, but probably not until software engineering produces a new approach to building complex systems reliably that works at the scale (here: number of developers and shipping schedule) Apple needs.

notalaser · on Oct 26, 2016

This is so, so true, that I wish there were enough beer in this world to gift you with. There's a lot of cruft in XNU, and there's even more of it in the rest of the system, but all this heap of hacks isn't just useless cruft that we'll be better off without. That heap of code also contains almost twenty years' worth of bugfixes and optimizations from more smart engineers than Apple can hope to hire and get to work together in a productive and meaningful manner. All this unpleasant cruft is what keeps the system alive and well and the users happy enough to continue using it.

More often than not, systems that get boldly rewritten from scratch end up playing catch-up for years. Frankly, I can't remember a single case when a full rewrite with an ambitious timetable wasn't a full-scale disaster. The few success stories, like (what eventually became) Firefox have taken a vastly different approach and took a lot more than users would have wanted.

A lot of idealistic (I was about to write naive) engineers think it's all a matter of throwing everything away. That's the easy part. Coming up with something better is the really hard part, and it's not achieved by just throwing the cruft away. If you innocent souls don't believe me, come on over to the Linux side, we have Gnome 3 cookies. You'll swear you're never going to touch anything that isn't xterm or macOS again.

pcwalton · on Oct 26, 2016

A lot of macOS/iOS was written from scratch, though: Core Graphics (vs. Cairo and FreeType), Core Animation, Core Text (vs. pango), WindowServer (vs. X11), UIKit (vs. Cocoa), IOKit (vs. the native BSD driver framework), Cocoa Finder (vs. Carbon Finder), LLVM/clang/Swift (if you count Chris Lattner's work on it at UIUC)...

Of those, the last one is very impressive: it's a decade-long from-scratch project that has succeeded in competing with a very entrenched project (GCC) in a mature market.

Regarding GNOME 3, the delta between GNOME 2 and GNOME 3 is far less than the delta between NeXTSTEP+FreeBSD and the first version of Mac OS X.

sedachv · on Oct 26, 2016

> This is so, so true, that I wish there were enough beer in this world to gift you with. There's a lot of cruft in XNU, and there's even more of it in the rest of the system, but all this heap of hacks isn't just useless cruft that we'll be better off without. That heap of code also contains almost twenty years' worth of bugfixes and optimizations from more smart engineers than Apple can hope to hire and get to work together in a productive and meaningful manner. All this unpleasant cruft is what keeps the system alive and well and the users happy enough to continue using it.

This whole premise is a false dichotomy. Apple does not have to throw away Mac OS X, and it does not have to keep piling crap on without fixing things. If you stop the excuses and rationalizations and commit to code quality you can ship an operating system with quality code and minimal bugs. The OpenBSD project has been doing this for two decades with minimal resources. There is no valid excuse other than "we are too lazy and incompetent."

blinkingled · on Oct 26, 2016

Bingo! That's a beer shot :)

Oh too much code, bad code, we inherited it, throwing out away won't work etc are baloney excuses without meat. All it takes is the will to hire and commit the right resources with an objective of increasing code quality. I mean take this bug itself - Apple did fix it but only after GPZ was on their arse. No reason they couldn't have reviewed it themselves and fixed it.

tptacek · on Oct 26, 2016

Hasn't the vulnerable code been here for over a decade? Why do people think this was an easy bug to spot? There are dozens of extremely qualified people looking for these things. I think there's a reason there isn't a Nemo Phrack article about this bug: it was hard to spot, and required a flash of insight about the competing lifecycles of objects in two different domains (POSIX and Mach).

notalaser · on Oct 27, 2016

I was (obviously...) responding to this:

> Apple needs to take a bit of those tens of billions of dollars they have sitting around and spend it on starting from scratch with something that's not horrifically crufty.

They certainly don't have to throw everything away. Not having thrown everything away is one of the reasons why OpenBSD is a good example here. Remember all that quality code that was in place before Cranor's UVM? (Edit: actually, the fact that UVM is an improvement over it should say something, too...)

And, at the risk of sounding bitter, in my experience, very few companies have the capability to "commit to code quality", and I don't think Apple is one of them.

Edit: BTW, I really like your blog. You should write more often :-).

sedachv · on Oct 27, 2016

> Remember all that quality code that was in place before Cranor's UVM?

So much before my time I was not even aware of it. For the uninitiated: https://www.usenix.org/legacy/events/usenix99/full_papers/cr...

> Edit: BTW, I really like your blog. You should write more often :-).

Thank you. :) Just this week I started thinking of getting back into it.

tptacek · on Oct 27, 2016

Hold on, because, I'm pretty familiar with pre- and post- UVM OpenBSD, because Arbor Networks shipped on OpenBSD (against medical advice) and ran into a number of really bad VM bugs that Theo couldn't fix because of the UVM rewrite!

__s · on Oct 26, 2016

But I am on the Linux side & wouldn't want to touch anything that is xterm or macOS again (suckless's st ftw)

Also currently running on a nice pure wayland system, no need for that X11 cruft

pcwalton · on Oct 26, 2016

> We may someday get an XNU rewrite, but probably not until software engineering produces a new approach to building complex systems reliably that works at the scale (here: number of developers and shipping schedule) Apple needs.

It's conceivable to perform a gradual transition away, though. They could demote Mach to a fast IPC system that just augments BSD, similar to the way the kdbus/bus1 proposal for Linux does. That would be difficult and a long-term project, but it would fix the underlying issue in a way that mostly retains userspace compatibility. Driver compatibility would be more difficult, of course…

tptacek · on Oct 26, 2016

That's true, but if you undertake a difficult and long-term project, you want the outcome to be decisive. Mach is ugly and a nest of bugs, but kernels implemented in C/C++ are bug magnets with several orders of magnitude more force.

My prediction is that we don't ever see an XNU refactor/ redesign/ rewrite so long as C/C++ is the kernel implementation language.

pcwalton · on Oct 26, 2016

No argument there. :)

wyager · on Oct 26, 2016

SeL4 is an indicator of things to come. We can build complicated OSs with extreme reliability; the up-front cost is just higher than most companies are willing to spend right now, because customers don't yet realize that it's technically possible to avoid the huge costs associated with software failure in exchange for slightly higher amortized software costs.

johncolanduoni · on Oct 26, 2016

Until we have a way to extend seL4 or something like it to full multiprocessor operation (without the multiple kernels with separate resources limitation that is currently the only way to use multiple processors with seL4) I'd disagree that we can build general-purpose OSes with verification. Our techniques for verifying concurrent programs are still very primitive and cumbersome, and I don't think many would take an OS where processes can't use multiple hardware threads seriously.

Also, seL4 (being a microkernel) leaves out a huge swath of kernel-level facilities that need to be implemented with the same standard of verification (resource management, network stack, drivers, etc.). Running on a verified microkernel provides a great foundation, but these still add a ton of code that needs to be verified. Plus the concurrency problem will strike again at this level.

tptacek · on Oct 26, 2016

L4 is incredibly simple. It is essentially (a word I chose carefully) the opposite of a complicated OS. It also doesn't really do anything.

If you have just a few extremely simple applications you'd like to run in an enclave, L4 is a good way to minimize the surface area between the applications themselves and the hardware.

If you'd like to host a complicated operating system on the simplest possible hosting layer: again, L4 is your huckleberry.

Otherwise: not so useful.

Note that if you just host XNU on top of L4, you might rule out a very small class of bugs, but the overwhelming majority of XNU bugs are contained entirely in the XNU layer itself; having XNU running on an adaptor layer doesn't do much to secure it.

duaneb · on Oct 26, 2016

I don't think I've ever seen a complicated OS based on SeL4, and it is the opposite of complicated itself.

I don't think SeL4 means much for macOS/iOS.

JoeAltmaier · on Oct 26, 2016

Hundreds of billions?

protomyth · on Oct 26, 2016

Yep, $237.6B in cash and equivalents

http://blogs.marketwatch.com/thetell/2016/10/25/apple-earnin...

0x0 · on Oct 26, 2016

tptacek · on Oct 26, 2016

This is kind of a "perfect storm" situation for Apple. At least three vectors are converging:

1. Apple inherited OSX from NeXT, and with it the Mach subsystem. Mach overcomplicates XNU.

2. XNU has become incredibly popular by dint of being shipped in the iPhone. Avie Tevanian probably did not see that coming when they designed the original BSD/Cocoa/XNU/whatever architecture. Regardless: it is now difficult to make sweeping architectural changes in XNU, because of the enormous installed base.

3. Ian Beer is simultaneously very clever and also willing to wade into the XNU Mach fire swamp.

I think it's fair to criticize Apple for designing the XNU frankenkernel. I think it's less legit to say that the presence of this bug class "looks bad" --- it's 2016 and this is just getting published. This is one of those bug classes that is sort of obvious in retrospect, and you wonder why people didn't catch it earlier.

blinkingled · on Oct 26, 2016

Keep in mind that this is a dangling pointer problem that is a stupid design mistake and Apple's own kexts used the same vulnerable pattern for years before Project Zero had to disclose it to Apple after which there were two failed mitigations and a fix that changes core OS code!

That's not very justifiable no matter how many storms and twisters you throw in the mix ;) I am not going to argue but for OS development this looks bad that it wasn't even looked at much rather fixed for so long given many of Apple's kexts had the same issue.

[Also this isn't an isolated problem by the way - look up the thread and you'll find two other egregious errors including one involving execve that was pointed to Apple in vain. Not long ago I could just remove and insert a kmod and that would lock up the system - go ahead and argue that's not a hot path or nobody does that but it does speak to general lack of code and testing quality. To their credit Apple fixed that particular one though when I reported them.]

tptacek · on Oct 26, 2016

The dangling pointer isn't the interesting part of the bug --- dangling pointers are, to a first approximation, the key component of all UAFs, which are a 15 year old bug class. The interesting bit about this bug is how they arrived at the dangling pointer: they had to manipulate at least three different object lifecycles in the kernel, one of of which involved a credential passing trick that is not super common in normal code.

I disagree that this is a simple bug that anyone should have been able to find. If you'd like to put money on whether or not this is going to win a 2017 Pwnie Award, I would be happy to take your money.

blinkingled · on Oct 26, 2016

When you take money for writing and maintaining an OS you have to be competent to avoid fundamental design issues like this when you're going to have a bunch of downstream users of the code!

Read the TLDR - exploitation of dangling pointers aside you can't write code and/or design APIs that does or makes it easy to do what the TLDR says - hold or use a task struct pointer and expect the euid of that task to stay the same. Many many places in the kernel do this and there are a great many very exploitable bugs as a result.

Ian Beer certainly is talented but that doesn't excuse Apple being so sloppy!

tptacek · on Oct 26, 2016

Aha! I think I understand what has you confused. You seem to think that the TLDR describes some basic rule of XNU programming that people were already aware of and expected to follow. No. Ian Beer invented that rule. In this post. That's why the bug is such a big deal; it's why we call it a "new bug class". It's also why it's the TLDR of the post.

blinkingled · on Oct 26, 2016

Wow straight up conclusion - I had it confused, right!

It's not a, pardon me for the expression, fucking XNU specific programming rule - it is a general rule that was invented long before Ian got to it! You don't hold a reference counted pointer and operate on it without taking a :gasp: reference first - having that shit in your sample code is just, well extra shitty!

Also, separately from the dangling pointer issue, the first sentence of the post is literally - This post discusses a design issue at the core of the XNU kernel!

tptacek · on Oct 26, 2016

You are describing only the first big in this document. There are four. The timeline we're commenting on is for the last 3, which are not UAFs.

wahern · on Oct 27, 2016

AFAIU, the dangling pointer problem wasn't a defect in the kernel; it was a problem introduced by some module authors who misunderstood the ownership semantics of the API. It might not even have been a pointer, per se, but I guess that's beside the point.

The larger problem was an inherent TOCTTOU bug in the interface semantics between the BSD subsystem and Mach. AFAIU that wasn't a dangling-anything problem; the reference was still valid. It was a logic and design problem that could happen in any language, even in Rust, and even without resorting to unsafe code.

hueving · on Oct 26, 2016

>(that's two extra allocations and copies in hot path!)

I've never really thought of process spawning as a hot-path (in the hundreds+ of calls per second sense). What software so heavily relies on spawning so many processes so quickly that the overhead of malloc would be noticeable?

blinkingled · on Oct 26, 2016

On UNIX systems fork+execve is a very commonly used code path - build a piece of software using make for example and the compiler process is forked/exec'ed lot of times, web servers like Apache used multiprocess model for long time etc. Yeah you could decide to not care about fork+exec* performance but what I am familiar with (Linux land) a lot of optimizations go into making fork/exec faster.

Also my bigger point though was not that it was in hot code path (although I would prefer to not be in a position where I need to change execve to add two allocs and copies if I can avoid it) - it was that they had to change execve() this late in the OS's development cycle to fix this long standing bug - typically you get to a point where you don't really need to touch core OS code and when you do you risk adding new issues to a central piece of code.

tptacek · on Oct 26, 2016

Yeah, I don't see the hot path problem either. I think the bigger issue is doing deep brain surgery on XNU as a hotfix; it's the kind of thing you want to put off for a next major release.

kevingadd · on Oct 26, 2016

Incidentally the huge set of problems with execve were very old news - I know a developer who tried to get them fixed (because they affected a piece of Apple software) and failed. Presumably this was because execve bugs were considered unimportant.

Oops!

abraham6464 · on Oct 27, 2016

Pardon my ignorance, could you explain what "hot path" means in this context?

a-no-n · on Oct 26, 2016

Ever since installing 10.12.1, I've been having a bunch of processes randomly entering a quasi-paused SIGSTOP-ish state (neither closable, apps not "bouncing" (loading) and just not responding. Running Instruments, correlating logs and such doesn't identify any clear cause. I'm having to `sudo kill -CONT -1` in order to get things moving again. I'm wondering if it's related to XNU mitigations or just some spurious "system configuration entropy" on my box.

cormacrelf · on Oct 26, 2016

I did exactly this when my Mac ran out of memory yesterday. Safari hung with a 'your computer is running out of memory' warning (168 tabs open!) and I didn't want to lose them all by force quitting. But the Safari process itself wasn't "Not Responding" and we were back to 0% CPU.

So I quit everything else, SIGCONT'd Safari, and it started responding again, so I tried unsuccessfully to close some tabs. Of course, Safari somewhat isolates pages in separate processes, so I ran `ps aux | grep WebContent | grep -v grep | cut -d' ' -f11 | xargs kill -SIGCONT` as well.

It all sprang back to life, and all the tabs I'd shut in vain zipped away. Got that one saved for later. It's probably easier just to use -1 now I've learned what that is!

I do wonder what's suspending these processes indefinitely. I should have done more inspection to see what state they were in. I'm not familiar with how WebKit content threads communicate though, so that's for another day.

a-no-n · on Oct 27, 2016

I have 16 GiB and 0 GiB was occurring.

g0xA52A2A · on Oct 26, 2016

Nice use of -1 there.

a-no-n · on Oct 26, 2016

Thx. For an encore:

    /* must be run as root */
    #include <signal.h>
    main(){for(;;){kill(-1,SIGCONT);sleep(10);}}

and a .plist. Temporary hackaround.

garblegarble · on Oct 26, 2016

What does -1 do, send that signal to all processes?

Sidnicious · on Oct 26, 2016

I was going to say that it signals launchd's process group — every process spawned by launchd, which always runs with PID 1. However, the `kill` manpage confirms your hunch:

  -1      If superuser, broadcast the signal to all processes; otherwise
          broadcast to all processes belonging to the user.

(This is on macOS/iOS, Linux might have slightly different semantics.)

tokenizerrr · on Oct 26, 2016

Yup. From man kill:

    -1

    All processes with pid larger than 1 will be signaled.

EdwinHoksberg · on Oct 26, 2016

Yes, but not for the current process and init

merpnderp · on Oct 26, 2016

To all with a PID > -1

aorth · on Oct 27, 2016

It's funny you mentioned processes entering a quasi-paused SIGSTOP-ish state. I swear I've been having tons of problems with Java/Tomcat the past week or so I've been on 10.12.1 (betas), and I keep thinking I broke my config by updating my Java version, changing my Tomcat config, or some other "system configuration entropy"! Nice to know I'm not alone and that I'm probably not crazy.

Tomcat 7.0.72 from Homebrew, Oracle Java 8u112, PostgreSQL 9.3 and 9.5.

softawre · on Oct 26, 2016

Interesting timeline stuff here:

https://bugs.chromium.org/p/project-zero/issues/detail?id=83...

tptacek · on Oct 26, 2016

It's funny, but I think it's written that way on purpose, not just as snark.

It's a little tricky to keep track of what happened here. There are 4 bugs in this post, and (I think) 2 different timelines: the UAF timeline for the first bug, and the TOCTTOU timeline for the 3 subsequent bugs. What's important to understand about the three TOCTTOU bugs is that there's a "right" fix for that bug, and a series of wrong fixes that delay the inevitable. Ian Beer and GPZ probably go into this whole process knowing what the right fix is, and with predictions on how they'll defeat any of the wrong fixes.

So it looks like GPZ reported a bug and then found flaws in the mitigations, but really all three of the flaws they found were known, at least conceptually, when GPZ reported the TOCTTOU race to Apple.

In the TOCTTOU timeline, Apple got an extension. Subtextually, it sounds like Tim Cook called Sundar Pichai. GPZ does not want to give extensions. They have a 90 day disclosure timeline, it's very well known, and probably the healthiest disclosure process in the industry. It's problematic for GPZ to give extensions because next time Tavis Ormandy finds a vulnerability in Norton Antivirus, Symantec is going to try to play chicken, and GPZ doesn't want to be at day 89 having to decide whether to drop zero-day versus being held hostage by a patch schedule.

But if a bug escalates all the way to Tim Cook, GPZ is probably pretty OK just with the degree to which that raises the profile of their bug --- it's hard to look at that and think Apple isn't taking your bug extremely seriously. So they'll trade the raised profile for the 5 week extension.

So they include a bunch of fuck-yous to Apple in the disclosure timeline, messaging to other vendors that GPZ is not going to budge even if your dumb original fix turns out to have a flaw that Ian Beer will notice and exploit. If you want the extension, you'd better have a Tim Cook.

Or maybe they're just having fun. Either way, a good read!

yuhong · on Oct 26, 2016

I wonder if it was really the CEO or was it someone else, and who it is probably is here.

tptacek · on Oct 26, 2016

It could have been Craig, too.

drinchev · on Oct 26, 2016

I'm still with 10.11. I don't plan to update soon, since the benefit of Siri, Photos and the other major features is quite small, compared to the risk that I might loose working days if something goes wrong ( I'm a freelancer ).

As far as I read in the article there will be 10.12.1 ( the final fix ) which will have that part of the kernel refactored. I hope Apple will also support 10.11 and issue an update with the same fix.

gulpahum · on Oct 26, 2016

I got an update to 10.11 El Capitan yesterday, which probably fixes the vulnerability. You can see the fix in Apple's support page, have a look at the bottom of the page about "System Boot":

https://support.apple.com/en-us/HT207275

It looks like even 10.10 Yosemite got a fix. Here's a full list of security fixes Apple has made to its software:

https://support.apple.com/en-us/HT201222

0x0 · on Oct 26, 2016

Sure looks like it, 10.10 + 10.11 + 10.12 is listed under the "System boot" CVE-2016-4669 entry. That must have taken quite the effort to backport!

twoodfin · on Oct 26, 2016

10.12.1 was released two days ago, which probably motivated the timing of this (excellent) writeup.

zitterbewegung · on Oct 26, 2016

I really don't think they will release an update since after every 10.x release they stop doing updates for the previous release. See https://en.wikipedia.org/wiki/MacOS#Release_history

mwfunk · on Oct 27, 2016

They do security updates for previous releases.

amluto · on Oct 26, 2016

I would argue that the original underlying problem here is the idea that having execve() increase privilege is acceptable. It's necessary for legacy reasons (sudo, anyone?), but even then, it's barely necessary. "sudo foo" could be implemented by asking a privileged daemon to run foo and handing off access to the console to the daemon.

On Linux, you can do PR_SET_NO_NEW_PRIVS to turn off this type of privilege gain, and it's even required for certain purposes. I would love to see someone develop a distribution that enables no_new_privs for all processes.

navaati · on Oct 28, 2016

> "sudo foo" could be implemented by asking a privileged daemon to run foo and handing off access to the console to the daemon.

FYI that exists, it's called pkexec (yes, from polkit) :)

empath75 · on Oct 26, 2016

Can someone explain what this means for the end user?

tptacek · on Oct 26, 2016

It's a pattern of privilege escalation bugs. If you run untrusted code on your machine, that code can obtain root or alter the kernel, potentially even if it's running as nobody.

There is a relatively long sequence of attempts to band-aid the bug, all of which failed, because Ian Beer found a systemic flaw, not just a single point flaw. So, the other implication for users is a general sense of foreboding.

Jerry2 · on Oct 27, 2016

So Apple still has no real solution to this bug?

thenewwazoo · on Oct 27, 2016

The real solution (refactoring how execve creates tasks) was release with 10.12.1 (and updates to prior versions of OS X too).

tptacek · on Oct 26, 2016

Bug 1: Many XNU drivers save task_t's on the heap without bumping their refcount.

    1. Attacker creates process A and B
    2. B->A send task port Bt
    3. A->XNU request IOKit framebuffer client for Bt
    4. A ditches Bt, retains client
    5. Kill B; Bt in client now dangling
    6. Trigger creation of privileged C, unrelated to A & B
    7. C inherits memory once used by Bt
    8. A use retained framebuffer client to write C's memory

What's important to understand is that this is not just a single UAF, but a pattern of UAFs scattered throughout XNU.

Fix: at step 3, check to make sure the task being given to IOKit is owned by the task making the IOKit request.

Bug 2: IOKit drivers cache task details on their stack; the lifetime of that cached task is the lifetime of the IOKit kernel object, not of the program that made the request. In particular: if you execve() an SUID, the task_t is repurposed.

    1. Attacker creates process A and B
    2. B->XNU request IOKit framebuffer for Bt, Bc
    3. B->A send client Bc
    4. B execve /bin/su. B is now running as root.
    5. A use retained framebuffer client to write B's memory

The tricky thing here is that this isn't just one bug, but a pattern of bugs: every place where a driver stashes a task_t on the heap and exposes functionality through a passable object is a place where colluding processes can potentially take advantage of SUIDs to raise privileges.

Fix: Lifetime of IOKit clients now tied to lifetime of creating process.

Bug 3: Even if a driver doesn't save a task_t on the heap, they're saved on the stack during the servicing of system calls and kernel mach message handlers, so there are race conditions.

    1. Attacker creates process A and B
    2. B->A send task port Bt
    3a. A->XNU task_threads(Bt), retrieving thread ports for Bt
    3b. (simultaneously) execve /bin/su. B is now running as root.
    4a. task_threads converts Bt to a task_t
    4b. execve modifies the same task_t to replace thread ports
    4c. task_threads retrieves the (now privileged) thread ports.
    5. A uses thread ports to overwrite registers and take control of B.

Fix: Kernel objects now check to see if a task_t has been touched by execve before returning them to userland. Even if you win the race, that failsafe prevents the kernel from giving you privileged objects.

Bug 4: You don't need the kernel to give you a privileged object directly; all you need is to be able to influence a privileged object.

    1. Attacker creates process A and B
    2. B->A send task port Bt
    3a. A->XNU task_set_exception_port(Bt), wiring A to B's exceptions
    3b. (simultaneously) execve /bin/su with rlimited stack. B is now running as root, briefly.
    4a. task_set_exception_port converts Bt to a task_t
    4b. execve modifies the same task_t to replace thread ports
    4c. task_set_exception_port rewrites the exception port.
    5. stack access in B, running /bin/su as root, causes a SEGV
    6. XNU generates an exception message, passing with it the thread ports, to A    
    7. A uses thread ports to overwrite registers and take control of B.

Fix: table flip. Rewrite execve so it generates entirely new task_ts when loading binaries, rather than repurposing old task_t.

This is all pretty magnificent. What's best about it is that it totally justifies the title of the post: pretty much every place in XNU where they save a task_t creates a TOCTTOU bug.

0x0 · on Oct 26, 2016

In particular about "pattern of UAFs scattered throughout XNU", there was missing memory management of task_t references in the sample code for kext drivers. So it wouldn't be enough to just add the missing retain calls in the Apple XNU kexts, because there may be an unknown number of third party kexts out there. Perhaps not as many as windows has device drivers, but it's still the same type of thing. Can you imagine if every windows device driver turns out to have copy-pasted privesc bugs?

In fact I think there was a similar bug-in-the-templates requiring a world-wide recompile in Microsoft MFC/ATL, perhaps it was this one https://blogs.msdn.microsoft.com/vcblog/2009/08/05/active-te...

EGreg · on Oct 26, 2016

[flagged]

dsr_ · on Oct 26, 2016

The author(s) of the statement, who are hoping to convince you to adopt the same position.

smoyer · on Oct 26, 2016

To be fair, those authors have some credibility since they're part of Project Zero.

EGreg · on Oct 27, 2016

But this passive voice with the subject omitted suggests that the article is a summary of a larger sentiment.

When Dijkstra says something is considered harmful, he is speaking on behalf not just of himself but others.

d_rc · on Oct 26, 2016

[flagged]

idlewords · on Oct 26, 2016

Find a bug like this and you can call the post about it whatever you want.

frou_dh · on Oct 26, 2016

Much like linking to xkcd 927, this is more tiresome than the 'offences' themselves.

wyldfire · on Oct 26, 2016

The best xkcd-pedantry-twist is 1589 IMO. :)

https://xkcd.com/1589/

BillinghamJ · on Oct 26, 2016

Often, yes. In this case, no not really. Zero days are pretty universally considered harmful.

sildur · on Oct 26, 2016

[flagged]

ant6n · on Oct 26, 2016

Why are you linking to an harmful essay then?

adamrezich · on Oct 26, 2016

Because human brains are pattern-matching engines; parent comment saw the phrase "considered harmful", didn't read the article, and linked a previously-read article that they presumed was related based on the title alone.

saynsedit · on Oct 26, 2016

The sad reality is that black hats have been exploiting this class of bugs for years.

45h34jh53k4j · on Oct 26, 2016

GPZ finds a entirely new class of vulnerability, Apple takes 4 months to patch and resolve. And you claim this has been exploited for years. There is 0 evidence of this, and such a claim demands proof.

I would be happy to apologise if you could find one example of exploitation prior to a few days ago when it became public.

saynsedit · on Oct 26, 2016

The point of the 0-day black market is to not reveal these attacks publicly. If there were public proof of this in the past it would have been fixed in the past.

Take my word for it when I say there are upper echelons of black hats that are stockpiling unknown 0-day exploits like this and presently using them in the wild.

Or dismiss me as irrational and continue with the belief that all bugs are unknown until white hats share them with Apple.

rictic · on Oct 26, 2016

There is a middle ground between "all bugs are known on the black market" and "no bugs are known only on the black market."