Hacking the OS X Kernel for Fun and Profiles

spudlyo · on Aug 13, 2013

It's articles like this that keep me reading HN. Kernel bugs, POSIX thread details, code disassembly, binary patching, and I get to learn about a new signal. What's not to love?

SecretofMana · on Aug 14, 2013

The paper on KSplice attached at the end is also really neat if you're into this stuff. It describes the design and implementation of a tool for applying binary patches based on normal source code diffs.

lelf · on Aug 13, 2013

At least thanks apple for not stripping everything

  Betty:~ lelf$ dsymutil -s /mach_kernel |head
  ----------------------------------------------------------------------
  Symbol table for: '/mach_kernel' (x86_64)
  ----------------------------------------------------------------------
  Index    n_strx   n_type             n_sect n_desc n_value
  ======== -------- ------------------ ------ ------ ----------------
  [     0] 00000004 0f (     SECT EXT) 08     0000   ffffff800084e79c '.constructors_used'
  [     1] 00000017 0f (     SECT EXT) 08     0000   ffffff800084e7a4 '.destructors_used'
  [     2] 00000029 0f (     SECT EXT) 01     0000   ffffff80005241b0 '_AddFileExtent'
  [     3] 00000038 0f (     SECT EXT) 01     0000   ffffff800051ef40 '_AllocateNode'
  [     4] 00000046 0f (     SECT EXT) 01     0000   ffffff800021d510 '_Assert'

klodolph · on Aug 13, 2013

> In order for the open file not to be inherited by the new program, we must introduce a new variant of open(2) that can open a file descriptor atomically marked "close on exec."

This is incorrect, because you can prevent fds from getting inherited even without "close on exec". Simply list the files in /dev/fd and you'll see all the file descriptors your program has open, and then you can close all of the ones that the exec'd program won't need. The same thing is done on Linux with /proc/self/fd. The whole facility gets wrapped in a library function (not a standard library function, sadly) since there is a minor trick involved in getting this right.

The problem with close on exec is that libraries would have to be modified to ensure they mark descriptors as close-on-exec. The current system is admittedly arcane and non-portable, but it does work.

mikeash · on Aug 13, 2013

Last time I checked on this, there was no safe way (i.e. using only async-signal-safe calls) to list the contents of /dev/fd post-fork. Did I miss something?

klodolph · on Aug 13, 2013

You can call functions that aren't async-signal-safe after a fork(). The standard library has prefork handlers that fire ensuring that you can e.g. opendir() safely after fork(), even if another thread was halfway through malloc() at the time of the fork.

Forking + threads is very messy, but it's not quite that pathological. I'd like to see an interface similar to posix_spawn() become the norm for spawning processes, with a fallback to fork()+exec() for more difficult use cases, but I don't think posix_spawn() is good enough.

mikeash · on Aug 13, 2013

Well, the man page says otherwise, and I'd hate to rely on undocumented safety buried in the standard library.

Edit: upon re-reading the man page, this bit caught my eye:

"If you need to use these frameworks in the child process, you must exec. In this situation it is reasonable to exec yourself."

So that would be one (highly painful) way to handle this safely. Fork, then self-exec, passing in arguments that tell the newly execed process what you really want to run. The new process can then do the /dev/fd listing in peace, then call exec again. Eww.

klodolph · on Aug 13, 2013

Seems like you're right. Whoops, time to go fix some of my code, which is why I really want a non-broken posix_spawn().

Another silly way to do things is to have the parent process send a list of fds to close over a pipe, which at least doesn't require a second call to exec().

I think the problem with close-on-exec is always going to be simple though: you have to make sure every library you use sets the flag.

bdash · on Aug 13, 2013

OS X provides the POSIX_SPAWN_CLOEXEC_DEFAULT flag to posix_spawn, which results in it automatically closing all file descriptors that aren't described by the file actions passed to posix_spawn.

asveikau · on Aug 14, 2013

It's interesting that starting a new process in a signal handler is something you want to do. (I just searched and yes I understand fork() and exec() are spec'd by POSIX to be safe.) It seems like writing safe signal handlers is hard enough, making them multiprocess seems like a "now you have two problems" kind of thing.

I suppose if I am not mistaken you could go the pre-readdir() route and open a directory with open() and read it with read(). Probably a portability mess though.

mikeash · on Aug 14, 2013

I don't think it's so much that starting a new process in a signal handler is an intended use case, but rather than the post-fork environment is very similar to the signal handler environment, and so calls that work in one will work in the other.

(In both cases, you can't count on currently acquired locks ever being released. In a signal handler, this is because it could be acquired on the current thread, and post-fork, this is because all other threads have been killed from the perspective of the child process. As such, no lock-based code can safely be called.)

asveikau · on Aug 14, 2013

On some platforms there is a closefrom() call that will do this. (In *BSD it's a syscall, googling around it looks to be a library function on Solaris.) Best to use that if it's available, as hardcoding paths like /dev/fd or /proc/self/fd will not be portable. (Neither will closefrom() be really, but at least it's semantically clear what it does and you can bring-your-own one of those for platforms that lack it.)

mzs · on Aug 13, 2013

Cool, that was a bug, and not too difficult to fix and must have felt great when figured-out. But sometimes on darwin you run into stuff that just is so crufty in the BSD emulation, that it's better to use the mach stuff. In this case these class of routines:

  http://web.mit.edu/darwin/src/modules/xnu/osfmk/man/task_sample.html

rsc · on Aug 13, 2013

I don't think that has a way to get stacks. See http://research.swtch.com/pprof for why stacks are important.

mzs · on Aug 13, 2013

In that case another task using clock_alarm(), task_suspend(), task_resume(), and thread_get_state().

asveikau · on Aug 13, 2013

So you can't boot OS X by taking stock Darwin/XNU and recompile? If so that is kind of disappointing. I understand they have private code that they don't want to open up but it would be nice if you could still boot with a self-compiled kernel.

bdash · on Aug 14, 2013

You can. See http://shantonu.blogspot.com/2012/07/building-xnu-for-os-x-1...

asveikau · on Aug 14, 2013

Glad to hear. It was suspect. "I can't be bothered to figure out how to build so I guess we'll patch the binary."

I guess Apple doesn't exactly shout from the rooftops how to do this but I see now that this is the top Google hit for "building xnu". So it's not too hidden.

janus · on Aug 13, 2013

Can someone explain what this patch actually does? I have an old Macbook Air with a malfunctioning sensor that cause the CPU to always run in powersaving mode (capped at 800 mhz), so it's basically unusable under OS X.

I have walked around this issue by installing Linux which allows me to tune the CPU governor manually, and it uncaps the artificial limit that the malfunctioning sensor puts.

Would this patch allow me to do something similar in Mountain Lion?

Moto7451 · on Aug 13, 2013

You could try NullCPUPowerManagement.kext. It's used in the OS X x86 community to disable the built in Power management. If you don't mind getting your hands dirty you might be able to get it (or another related kext) to turn off frequency switching.

[1]http://www.osx86.net/downloads.php?do=file&id=16

janus · on Aug 21, 2013

I don't mind getting my hands dirty at all. I'll see if I can find any more info about this kext, thanks :)

mikeash · on Aug 13, 2013

The patch fixes a bug with how the SIGPROF signal is delivered to a process, ensuring that it gets delivered to the proper thread rather than an arbitrary thread. It's unrelated to anything in the area of powersaving or clock speed throttling.

janus · on Aug 13, 2013

great, thanks for pointing it out.

Someone · on Aug 13, 2013

The patch? No, but the technique can be used to make any change to the kernel, as long as you don't need more bytes.

The change you want likely is such a change, as, to me, it looks like a 'replace an if by a NOP or a jump always' patch.

kineticfocus · on Aug 14, 2013

You could also try this out... https://github.com/hholtmann/smcFanControl/tree/master/smc-c...

janus · on Aug 21, 2013

Will this also allow me to scale the cpu? I was already able to change the fan's speed .... thanks

f2f · on Aug 13, 2013

I'm afraid the title may scare people off the article. i assure you, "hacking" here is used in the canonical sense: "An incredibly good, and perhaps very time-consuming, piece of work that produces exactly what is needed."

ot · on Aug 13, 2013

Have you noticed you are posting this on Hacker News? :)

f2f · on Aug 13, 2013

... where it will compete with articles about the Star Wars Weather Forecast and discussions on TV programmes. yes, the humour is not lost on me ;)

endlessvoid94 · on Aug 13, 2013

I think the "hacking X for fun and profit" is a pretty well-known trope.

tonyarkles · on Aug 13, 2013

For me, the origin of this was from "Smashing the stack for fun and profit (1996)"[1], which does actually refer to somewhat malicious purposes. Apparently it has existed in literature for a long time before that though[2].

[1] http://www.phrack.org/issues.html?id=14&issue=49 [2] http://english.stackexchange.com/questions/25205/what-is-the...

endlessvoid94 · on Aug 13, 2013

Hah, indeed. I had that Aleph One article printed out and stapled, and would carry it with my books in high school!

f2f · on Aug 13, 2013

i agree, but the context nowadays is different. you don't often see articles about modifying the binaries of major operating system kernels.

what impressed me most is that the hack worked flawlessly on one of my 10.8.5 pre-release kernels even though it had not been tested there yet.

emacsitor · on Aug 13, 2013

Can anyone provided an explanation of how the binary patch is applied?

f2f · on Aug 13, 2013

If you're familiar with Go you can read the source of the apply function here:

the loadKernel function reads in the kernel and builds a symbol table looking for the bsd_ast and current_thread symbols here:

https://code.google.com/p/rsc/source/browse/cmd/pprof_mac_fi...

then the "apply" func writes the modified binary instructions code:

https://code.google.com/p/rsc/source/browse/cmd/pprof_mac_fi...