Hacker News new | past | comments | ask | show | jobs | submit login
Hacking the OS X Kernel for Fun and Profiles (swtch.com)
174 points by f2f on Aug 13, 2013 | hide | past | favorite | 35 comments



It's articles like this that keep me reading HN. Kernel bugs, POSIX thread details, code disassembly, binary patching, and I get to learn about a new signal. What's not to love?


The paper on KSplice attached at the end is also really neat if you're into this stuff. It describes the design and implementation of a tool for applying binary patches based on normal source code diffs.


At least thanks apple for not stripping everything

  Betty:~ lelf$ dsymutil -s /mach_kernel |head
  ----------------------------------------------------------------------
  Symbol table for: '/mach_kernel' (x86_64)
  ----------------------------------------------------------------------
  Index    n_strx   n_type             n_sect n_desc n_value
  ======== -------- ------------------ ------ ------ ----------------
  [     0] 00000004 0f (     SECT EXT) 08     0000   ffffff800084e79c '.constructors_used'
  [     1] 00000017 0f (     SECT EXT) 08     0000   ffffff800084e7a4 '.destructors_used'
  [     2] 00000029 0f (     SECT EXT) 01     0000   ffffff80005241b0 '_AddFileExtent'
  [     3] 00000038 0f (     SECT EXT) 01     0000   ffffff800051ef40 '_AllocateNode'
  [     4] 00000046 0f (     SECT EXT) 01     0000   ffffff800021d510 '_Assert'


> In order for the open file not to be inherited by the new program, we must introduce a new variant of open(2) that can open a file descriptor atomically marked "close on exec."

This is incorrect, because you can prevent fds from getting inherited even without "close on exec". Simply list the files in /dev/fd and you'll see all the file descriptors your program has open, and then you can close all of the ones that the exec'd program won't need. The same thing is done on Linux with /proc/self/fd. The whole facility gets wrapped in a library function (not a standard library function, sadly) since there is a minor trick involved in getting this right.

The problem with close on exec is that libraries would have to be modified to ensure they mark descriptors as close-on-exec. The current system is admittedly arcane and non-portable, but it does work.


Last time I checked on this, there was no safe way (i.e. using only async-signal-safe calls) to list the contents of /dev/fd post-fork. Did I miss something?


You can call functions that aren't async-signal-safe after a fork(). The standard library has prefork handlers that fire ensuring that you can e.g. opendir() safely after fork(), even if another thread was halfway through malloc() at the time of the fork.

Forking + threads is very messy, but it's not quite that pathological. I'd like to see an interface similar to posix_spawn() become the norm for spawning processes, with a fallback to fork()+exec() for more difficult use cases, but I don't think posix_spawn() is good enough.


Well, the man page says otherwise, and I'd hate to rely on undocumented safety buried in the standard library.

Edit: upon re-reading the man page, this bit caught my eye:

"If you need to use these frameworks in the child process, you must exec. In this situation it is reasonable to exec yourself."

So that would be one (highly painful) way to handle this safely. Fork, then self-exec, passing in arguments that tell the newly execed process what you really want to run. The new process can then do the /dev/fd listing in peace, then call exec again. Eww.


Seems like you're right. Whoops, time to go fix some of my code, which is why I really want a non-broken posix_spawn().

Another silly way to do things is to have the parent process send a list of fds to close over a pipe, which at least doesn't require a second call to exec().

I think the problem with close-on-exec is always going to be simple though: you have to make sure every library you use sets the flag.


OS X provides the POSIX_SPAWN_CLOEXEC_DEFAULT flag to posix_spawn, which results in it automatically closing all file descriptors that aren't described by the file actions passed to posix_spawn.


It's interesting that starting a new process in a signal handler is something you want to do. (I just searched and yes I understand fork() and exec() are spec'd by POSIX to be safe.) It seems like writing safe signal handlers is hard enough, making them multiprocess seems like a "now you have two problems" kind of thing.

I suppose if I am not mistaken you could go the pre-readdir() route and open a directory with open() and read it with read(). Probably a portability mess though.


I don't think it's so much that starting a new process in a signal handler is an intended use case, but rather than the post-fork environment is very similar to the signal handler environment, and so calls that work in one will work in the other.

(In both cases, you can't count on currently acquired locks ever being released. In a signal handler, this is because it could be acquired on the current thread, and post-fork, this is because all other threads have been killed from the perspective of the child process. As such, no lock-based code can safely be called.)


On some platforms there is a closefrom() call that will do this. (In *BSD it's a syscall, googling around it looks to be a library function on Solaris.) Best to use that if it's available, as hardcoding paths like /dev/fd or /proc/self/fd will not be portable. (Neither will closefrom() be really, but at least it's semantically clear what it does and you can bring-your-own one of those for platforms that lack it.)


Cool, that was a bug, and not too difficult to fix and must have felt great when figured-out. But sometimes on darwin you run into stuff that just is so crufty in the BSD emulation, that it's better to use the mach stuff. In this case these class of routines:

  http://web.mit.edu/darwin/src/modules/xnu/osfmk/man/task_sample.html


I don't think that has a way to get stacks. See http://research.swtch.com/pprof for why stacks are important.


In that case another task using clock_alarm(), task_suspend(), task_resume(), and thread_get_state().


So you can't boot OS X by taking stock Darwin/XNU and recompile? If so that is kind of disappointing. I understand they have private code that they don't want to open up but it would be nice if you could still boot with a self-compiled kernel.



Glad to hear. It was suspect. "I can't be bothered to figure out how to build so I guess we'll patch the binary."

I guess Apple doesn't exactly shout from the rooftops how to do this but I see now that this is the top Google hit for "building xnu". So it's not too hidden.


Can someone explain what this patch actually does? I have an old Macbook Air with a malfunctioning sensor that cause the CPU to always run in powersaving mode (capped at 800 mhz), so it's basically unusable under OS X.

I have walked around this issue by installing Linux which allows me to tune the CPU governor manually, and it uncaps the artificial limit that the malfunctioning sensor puts.

Would this patch allow me to do something similar in Mountain Lion?


You could try NullCPUPowerManagement.kext. It's used in the OS X x86 community to disable the built in Power management. If you don't mind getting your hands dirty you might be able to get it (or another related kext) to turn off frequency switching.

[1]http://www.osx86.net/downloads.php?do=file&id=16


I don't mind getting my hands dirty at all. I'll see if I can find any more info about this kext, thanks :)


The patch fixes a bug with how the SIGPROF signal is delivered to a process, ensuring that it gets delivered to the proper thread rather than an arbitrary thread. It's unrelated to anything in the area of powersaving or clock speed throttling.


great, thanks for pointing it out.


The patch? No, but the technique can be used to make any change to the kernel, as long as you don't need more bytes.

The change you want likely is such a change, as, to me, it looks like a 'replace an if by a NOP or a jump always' patch.



Will this also allow me to scale the cpu? I was already able to change the fan's speed .... thanks


I'm afraid the title may scare people off the article. i assure you, "hacking" here is used in the canonical sense: "An incredibly good, and perhaps very time-consuming, piece of work that produces exactly what is needed."


Have you noticed you are posting this on Hacker News? :)


... where it will compete with articles about the Star Wars Weather Forecast and discussions on TV programmes. yes, the humour is not lost on me ;)


I think the "hacking X for fun and profit" is a pretty well-known trope.


For me, the origin of this was from "Smashing the stack for fun and profit (1996)"[1], which does actually refer to somewhat malicious purposes. Apparently it has existed in literature for a long time before that though[2].

[1] http://www.phrack.org/issues.html?id=14&issue=49 [2] http://english.stackexchange.com/questions/25205/what-is-the...


Hah, indeed. I had that Aleph One article printed out and stapled, and would carry it with my books in high school!


i agree, but the context nowadays is different. you don't often see articles about modifying the binaries of major operating system kernels.

what impressed me most is that the hack worked flawlessly on one of my 10.8.5 pre-release kernels even though it had not been tested there yet.


Can anyone provided an explanation of how the binary patch is applied?


If you're familiar with Go you can read the source of the apply function here:

the loadKernel function reads in the kernel and builds a symbol table looking for the bsd_ast and current_thread symbols here:

https://code.google.com/p/rsc/source/browse/cmd/pprof_mac_fi...

then the "apply" func writes the modified binary instructions code:

https://code.google.com/p/rsc/source/browse/cmd/pprof_mac_fi...




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: