Signalfd is useless

kentonv · on May 18, 2015

Another weirdness about signalfd: read() from a signalfd returns signals for the calling process, regardless of what process created the signalfd (e.g. it could have been inherited through fork()). That's arguably usually what you want, but is inconsistent with usual file descriptor semantics, which say that it doesn't matter who read()s.

One place where the inconsistency gets weird is when you use signalfd with epoll. The epoll will flag events on the signalfd based on the process where the signalfd was registered with epoll, not the process where the epoll is being used. One case where this can be surprising is if you set up a signalfd and an epoll and then fork() for the purpose of daemonizing -- now you will find that your epoll mysteriously doesn't deliver any events for the signalfd despite the signalfd otherwise appearing to function as expected. That took me a day or two to debug. :(

With all that said, at the end of the day I disagree with Geoff. I would rather use signalfd than signal handlers. The "self-pipe trick" is ugly, involves a lot of unnecessary overhead, and runs the risk of deadlocking if you receive enough signals to fill the pipe buffer before you read them back (which can be solved with additional synchronization, but ick). In fact, in my own code, on systems that don't have signalfd or any similar mechanism, I tend to block signals except when I'm about to call poll(), and then siglongjmp() out of the signal handler to avoid the usual race condition. (See pselect(2) for discussion of said race condition.)

I think it's just a fact of life that you need to clear your signal mask between fork() and exec(), and yeah no one does this, whoops.

BTW, for the specific problem of dealing with child processes, I really hope Linux adopts the Capsicum interface as FreeBSD has:

https://www.freebsd.org/cgi/man.cgi?query=pdfork&sektion=2

Until then, you simply can't expect to reap children via signals. You use the signal to let you know that it's time to call wait().

markrwilliams · on May 18, 2015

> The "self-pipe trick" is ugly, involves a lot of unnecessary overhead, and runs the risk of deadlocking if you receive enough signals to fill the pipe buffer before you read them back

The unfortunate terseness of the original "self-pipe trick" description makes the solution to this difficult to see. As far as I've figured out there are two things to notice:

1) You're supposed to set the pipe to be non-blocking. Presumably you also then don't check the return code of the write(2) call in the signal handler. While this solves the case of a signal handler blocking forever, it does mean you might have dropped writes that correspond to signal receptions. That leads us to:

2) The self-pipe trick specifically calls out handling SIGCHLD (probably because it's one signal that you don't want to ignore!) But given the chances of dropping a byte as described in 1) and the fact that SIGCHLD and fork are explicitly called out, I can only assume that the lesson here is: only have one pipe per signal you intend to handle. Since multiple signals sent to a process may result in a single signal being delivered, your real signal handling code (the stuff that's watching the other end of the pipe) already has to deal with this situation.

As for Capsicum, I can't wait til they implement pdwait(2)! Until then, at least pdfork(2) ensures that the parent process' death kills the child process...

[1] http://cr.yp.to/docs/selfpipe.html

nunwuo · on May 19, 2015

> 1) You're supposed to set the pipe to be non-blocking. Presumably you also then don't check the return code of the write(2) call in the signal handler. While this solves the case of a signal handler blocking forever, it does mean you might have dropped writes that correspond to signal receptions.

That doesn't matter. You're not supposed to have a byte in the pipe for every signal. What matters is having at least one byte any time there are unprocessed signals. The only function of the pipe is to wake the select(2) up. You still need bookkeeping elsewhere.

> That leads us to:

> 2) The self-pipe trick specifically calls out handling SIGCHLD (probably because it's one signal that you don't want to ignore!) But given the chances of dropping a byte as described in 1) and the fact that SIGCHLD and fork are explicitly called out, I can only assume that the lesson here is: only have one pipe per signal you intend to handle. Since multiple signals sent to a process may result in a single signal being delivered, your real signal handling code (the stuff that's watching the other end of the pipe) already has to deal with this situation.

Meh. Just have one pipe and a sig_atomic_t for each different types of signal you're interested in.

osandov · on May 18, 2015

> BTW, for the specific problem of dealing with child processes, I really hope Linux adopts the Capsicum interface as FreeBSD has:

> https://www.freebsd.org/cgi/man.cgi?query=pdfork&sektion=2

Whoops, I mentioned this in another comment and missed this, but see http://lwn.net/Articles/638613/

crdoconnor · on May 19, 2015

>One place where the inconsistency gets weird is when you use signalfd with epoll. The epoll will flag events on the signalfd based on the process where the signalfd was registered with epoll, not the process where the epoll is being used. One case where this can be surprising is if you set up a signalfd and an epoll and then fork() for the purpose of daemonizing -- now you will find that your epoll mysteriously doesn't deliver any events for the signalfd despite the signalfd otherwise appearing to function as expected. That took me a day or two to debug.

Is this what libuv does? I'm pretty sure it reads signals using epoll on linux, so in theory - if it does it this way - this bug could be be underlying all of node.js.

geofft · on May 18, 2015

Does running the self-pipe trick on a separate thread solve that issue? It seems like it's basically equivalent to signalfd (neither worse nor better, unless you're worried about platform-specific thread bugs): you end up with a signal mask on your main thread, but you also avoid EINTR on your main thread. Any possible pipe lockup just happens on the signal-handling thread, so the mainloop can keep running and eventually dequeue signals.

nunwuo · on May 19, 2015

> Does running the self-pipe trick on a separate thread solve that issue? It seems like it's basically equivalent to signalfd (neither worse nor better, unless you're worried about platform-specific thread bugs): you end up with a signal mask on your main thread, but you also avoid EINTR on your main thread. Any possible pipe lockup just happens on the signal-handling thread, so the mainloop can keep running and eventually dequeue signals.

There's no need for threads. Set the pipe to non-blocking and ignore the write() error if it's EAGAIN/EWOULDBLOCK. See my response above for why dropping writes if a byte already exists in the pipe is okay.

geofft · on May 19, 2015

Sure, but that doesn't solve the EINTR problem. If you accept signal-handler interruptions on threads where you actually do work, then you risk interrupting system calls on those threads, and even SA_RESTART isn't guaranteed to work all of the time. That's what a separate thread (or signalfd) wins you.

(But yes, setting it non-blocking is correct.)

kentonv · on May 18, 2015

> Does running the self-pipe trick on a separate thread solve that issue?

Perhaps but then you're mixing signals and threads and you're in for a whole new world of hurt. :)

E.g. I've found that OSX does not always behave correctly when delivering signals to a process where one thread has blocked the signal but another hasn't, though I cannot remember the exact details. And of course on any system there is such a thing as signals addressed to a specific thread rather than a whole process (ptherad_kill()).

geofft · on May 18, 2015

I've heard claims of badness with signals and threads, but I've failed to track down concrete problems -- I would really like to know what they are. Meanwhile I've heard of people successfully using dedicated signal-handling threads in production, at least on Linux.

I'm not really sure that thread-directed signals are in scope for the sorts of things where you must use signals (SIGINT, SIGTSTP, etc. from a terminal, SIGCHLD from child termination, etc.) Those should all be process-directed. If you design your own API that involves signals, then sure, but that's a problem of your own making.

caf · on May 19, 2015

If you're going to create a dedicated signal-handling thread as the author recommends (which is one of the best ways to handle signals in a pthreads application), you don't need to use signal handlers at all; you should just mask the signal(s) and have the signal-handling thread loop around sigwaitinfo().

To his broader point, the mistake is to assume you will be able to get one signal delivered per signal raised. That's just not how (classic) UNIX signals work (POSIX realtime signals are different, and are queued) - they fundamentally need to be treated as level-triggered, not edge-triggered. For the SIGCHLD example, when a SIGCHLD is recieved (no matter whether through signal handler, self-pipe trick, signalfd() or sigwaitinfo()) you need to loop around waitpid() with the WNOHANG flag until it stops returning child PID statuses.

ajross · on May 18, 2015

Money quote seems to be:

> So you have to be very careful to reset any masked signals before starting a child process

I don't see what this has to do with signalfd. That statement is true generically. Non-default Unix signal handling and subprocess management have never cooperated cleanly. The point to signalfd is to provide a simpler mechanism to integrate signals (which are a legacy API in almost all cases) with existing synchronization and event handling architectures, not to magically make them not suck.

geofft · on May 18, 2015

Yup. A lot of people, myself included, were under the impression that signalfd takes over from the normal signal-handling pathway: it doesn't.

It's particularly bad because the only way to get notified on a child exiting, in an event-handling architecture, is to wait for SIGCHLD notifications. (You can't call wait/waitpid because that's blocking; at best you can call it in a separate thread.) So even if all you're trying to do is write a program that runs a handful of children asynchronously, you have to incorporate signals into your architecture. And signalfd taunts you by providing siginfo with each notification, so you think you know which child exited -- but in fact, those siginfos could have coalesced, so this data is useless.

A friend claimed to me that siginfo is only useful for so-called synchronous signals (SIGSEGV, SIGILL, etc. -- stuff that you can't handle in an event loop anyway), which I'm inclined to believe, the more I think about it. So there's no reason for signalfd to have included siginfo.

jsnell · on May 18, 2015

Both wait() and waitpid() have a non-blocking mode (WNOHANG). So use SIGCHLD to decide when to check the status of the child processes, and waitpid() to actually do it.

jacquesm · on May 18, 2015

There is a not-so-nice but very effective trick: Send the process you wish to check for a signal of '0' (using kill(pid, 0)), if that fails the process no longer exists.

This is kind of nasty in case your pids tick over very fast so you'd have to do this with a fairly high frequency to make sure you don't hit the same pid twice.

Fortunately this is hardly ever a problem but it is something worth thinking about when using the trick.

See also:

http://unixhelp.ed.ac.uk/CGI/man-cgi?kill+2

No actual signal is sent, it's just asking the kernel to check if the signal could have been sent.

dezgeg · on May 18, 2015

There are plans for CLONE_FD which allows listening to child process exits via file descriptors: http://lwn.net/Articles/638613/

justincormack · on May 18, 2015

Thats why kqueue on the BSDs is nicer as it can listen for children. Plus process descriptors for children also make this easier, Linux should get these with the capsicum port eventually.

pjc50 · on May 18, 2015

I'm now wondering how it would be if we just gave up on the whole thing and stole the Windows WaitForMultipleObjects() semantics.

jerf · on May 18, 2015

I am at a loss as to how you decided that was the "money quote". I saw it as an incidental aside as long as he was in the area, in a large post about signalfd, as evidenced by the, you know, 22 instances of the term "signalfd", including one in the title.

_ivvf · on May 18, 2015

I've used signalfd before and consider it an improvement over normal signal-handling. Signal coalescing happens regardless of signalfd.

The only major thing you have to remember when using signalfd is to mask the signals you want to only receive via signalfd and then unmask these signals in any child processes before calling exec*() functions.

treve · on May 18, 2015

Question from someone not knowing much about low-level programming and dealing with signals:

If signals are so problematic, why rely on them? Is the functionality useful for other things other than dealing with 'emergencies'?

One thing I can see that is useful, is that it allows a program to gracefully deal with a kill, but many applications seem to have a 'graceful stop' mechanism that doesn't need signals.

geofft · on May 18, 2015

A lot of functionality is only available via signals. For instance, there's no way other than SIGCHLD to be asynchronously notified when a process exits (unless you want to dedicate a thread to running wait()). There's no way other than SIGWINCH to be notified when your terminal gets resized.

You could certainly imagine some kernel extensions that take all of this useful functionality and make it available in ways other than signals, leaving just signals for things you have to deal with immediately like SIGSEGV (so you can print a nice error message before quitting), but they don't exist yet. I imagine some of the intent behind signalfd was to do this all at once for all signals, but it didn't quite work.

osandov · on May 18, 2015

> You could certainly imagine some kernel extensions that take all of this useful functionality and make it available in ways other than signals, leaving just signals for things you have to deal with immediately like SIGSEGV (so you can print a nice error message before quitting), but they don't exist yet.

In the SIGCHLD case, there's a proposed CLONE_FD flag to clone which would return a file descriptor instead of a PID. This fd could be read poll'd on and read from, which is much nicer than dealing with SIGCHLD. See http://lwn.net/Articles/638613/

So those kernel extensions are happening :)

quotemstr · on May 18, 2015

clonefd is a very limited solution. What we really need is the ability to open a file descriptor handle to any process. That ability solves all sorts of race conditions. Conveniently, we already have an interface to open file descriptors for processes: /proc. We just need to extend its semantics slightly.

markrwilliams · on May 19, 2015

If you get a file descriptor that refers to a child process upon its creation, then that file descriptor should behave like other file descriptors.

That means you ought to be able to transfer it to other processes via file descriptor passing (the SCM_RIGHTS ancillary message; see man unix).

The identity of a process would thus be local to its parent or to a process with which the parent has agreed to share that identity. Not only does this avoid race conditions, it also enables a completely unrelated process to reap a child which can be terrifically useful.

This is exactly the approach the Capsicum sandboxing framework (mentioned elsewhere) is taking. The goal there, though, is to eliminate globally shared identifiers as much as possible -- which makes sense for sandboxing!

colin_mccabe · on May 18, 2015

Maybe I'm misunderstanding, but wouldn't opening a file descriptor to a process via /proc have the same race condition issues with process id wraparound? After all, processes in /proc are opened by process ID (the only exception I can think of is /proc/self... maybe I missed some other exceptions?)

Overall, it seems easier to avoid process ID wraparound attacks via using the full 32-bit number space for PIDs. There may be a few programs that need to be changed because they did something silly like cast pid_t to short, but I think overall most programs would work just fine. As far as I can remember, the reason for using low numbers was because people didn't want to type longer ones at the shell. Internally the kernel and libraries store everything as 32-bit, at least on Linux.

quotemstr · on May 20, 2015

> Maybe I'm misunderstanding, but wouldn't opening a file descriptor to a process via /proc have the same race condition issues with process id wraparound?

Absolutely. But once you've opened the file descriptor, the kernel would guarantee that its corresponding process ID would remain unused until you closed the file descriptor. (For example, it could keep the process a zombie if it exits.)

This way, it's possible to write a reliable killall: walk /proc, call openpid() on each entry, and with the PID FD open, examine the process's user, command line, or whatever else, kill the process if necessary, and close the process file descriptor.

No race.

colin_mccabe · on May 20, 2015

But once you've opened the file descriptor, the kernel would guarantee that its corresponding process ID would remain unused until you closed the file descriptor. (For example, it could keep the process a zombie if it exits.)

That seems like it would open you up to a trivial denial-of-service attack where some attacker just spawns a bunch of processes and never closes the /proc handles. Then you can't start any more processes because there are no more process IDs available. The only workaround is to have a larger PID space, which poses the question... why not just have a larger PID space in the first place and skip the new, non-portable API?

quotemstr · on May 21, 2015

It works out all right on Windows, which uses exactly the approach I advocate. And you can already DoS the system in myriad ways. If you're still worried: we have ulimits for other resources. We can have a ulimit for this one too.

colin_mccabe · on May 22, 2015

I agree that there are already many ways to DoS the system-- for example, the age-old fork bomb. But that is not a good reason to add more flaws. People are working on ways to fix the old flaws, such as cgroups.

I don't think a ulimit would be very effective here at preventing denial-of-service. Let's say I set it to 100... I can just have my 100 children each spawn and hold on to 100 children of their own, and so on and so forth. If I just go with a bigger process ID space all these headaches go away, plus existing software works without modification.

quotemstr · on May 22, 2015

32 bits is still too small. I wouldn't be comfortable relying on the size of the PID space to avoid collisions until we made it 128 bits or so. I think you're still seriously overestimating the danger of a DoS here: whatever limits apply to forked processes can apply to process handles. Whatever mitigates fork bombs will also mitigate handle-based attacks.

The advantages of process handles outweigh this small risk.

Dylan16807 · on May 23, 2015

In what scenario would you run out of 64 bit PIDs? How many per second for how many centuries?

quotemstr · on May 28, 2015

It's not a matter of running out of PIDs: it's about the probability of accidental collision.

Dylan16807 · on May 23, 2015

A workable limit is trivial, how about 100 zombie process IDs per user.

scott_s · on May 18, 2015

I think geofft's answer is a good one, but I wanted to add a different point: you can't not deal with signals. They're going to happen, and even if you block or ignore them, that doesn't change the fact that something happened that may be important to your process. That's just the rules of the game.

crdoconnor · on May 19, 2015

>One thing I can see that is useful, is that it allows a program to gracefully deal with a kill, but many applications seem to have a 'graceful stop' mechanism that doesn't need signals.

I don't see how that's possible. You need to listen to at least SIGTERM, SIGINT and SIGHUP if you're going to gracefully stop.

dllthomas · on May 18, 2015

What I'd like to see is a write signalfd, so I can send signals without a race condition that might lead to my signalling the wrong process.

amelius · on May 19, 2015

I'd like to see other concepts also covered by file descriptors. Example: semaphores, mutexes, condition variables, timers.

geofft · on May 21, 2015

Timers are available via timerfd (man timerfd_create).

There used to be a FUTEX_FD, but it got removed. I think you can mostly achieve the effect with eventfd, though.

worik · on May 18, 2015

If you misuse signals.

I am rusty on low level programming but I have done enough to know that this poster is whining a bit too much.

Signals should only be used in the general case for exceptional circumstances, like killing a programme. A signal handler's job is to deal with the crisis, e.g., gracefully exit.

In lower level cases signals mean there is an urgent event, something that must be done now or it is useless to bother.

If you try to use signals for general purpose IPC then you get what you deserve - chaos.

geofft · on May 18, 2015

As I mentioned in another comment, there are cases (like SIGWINCH) where the only interface the kernel gives you for general-purpose IPC is signals. In any case, if you restrict yourself to using signals for urgent respond-immediately events, then signalfd is still useless, since you want to handle those synchronously. :)

(That said, I would definitely agree that the kernel is misusing signals -- SIGWINCH should just be some form of metadata on the terminal fd, not a process-wide signal.)

caf · on May 19, 2015

This would probably be because by the time resizeable terminals became common, the semantics of the tty device were long settled. It probably would have made sense to create some kind of 'ttyaux' device at this point, though.

deathanatos · on May 19, 2015

Except you're left with cases for which there is no alternative but signals (you name one; a sibling mentions WINCH, and I'll add SIGCHILD.), and for which the only reliable way to handle the case is to use the self-pipe trick, or maybe a signalfd.

It's not a case of "misuse": the API is so truly atrociously bad that any programmers attempt is going to be wrong. I'm aware of the pitfalls, and I do not feel comfortable stating that I would get it right; someone who is not aware of the pitfalls is hopelessly screwed.