The first time I encountered signals and EINTR, I was baffled. It felt like I wa...

unnah · on Oct 16, 2023

The existence of EINTR is famously discussed in the Unix hater's handbook [0] and Richard Gabriel's "worse is better" essay [1], not sure which one told the story first. Paraphrasing, in the "MIT philosophy" the kernel should obviously continue the system call automatically because that would be simpler for the caller, and in the "New Jersey philosophy" the Unix implementation is obviously better because the kernel is simpler and functions more transparently.

[0] https://web.mit.edu/~simsong/www/ugh.pdf, page 313

[1] https://www.dreamsongs.com/WIB.html

pcwalton · on Oct 16, 2023

Though now there's `SA_RESTART`, so Unix ended up doing the "right thing" eventually.

There's a lesson here, I think: "worse is better" is good advice when it lets you ship and get software into the hands of customers quicker, but it doesn't change the fact that you should do the "right thing" at some point.

jart · on Oct 16, 2023

I wouldn't be so sure that SA_RESTART is the right thing, because using SA_RESTART means you have to do actual work inside your signal handlers. The nice thing about EINTR is your signal handlers can be dumb, and just set a "got_signal = true" variable, so you know 100% for sure your signal handler is signal safe. Then your read() loop just ignores EINTR and checks that variable, to know when it needs to do work.

o11c · on Oct 16, 2023

Unfortunately, due to the delay, a lot of people reimplemented it badly.

For example, signal handling is completely broken in Python since blocking syscalls (think `select`, but see `signal(7)` for a complete list) will not be interrupted.

`SA_RESTART` correctly excludes such syscalls so you can properly handle the EINTR instead of hanging.

Sesse__ · on Oct 16, 2023

The big problem is; if you have a thread reading from a socket, how can you ever interrupt it if you wish to do so? EINTR on signals is really not so bad when you think of it from that angle. (If you don't want to deal with EINTR everywhere, you can block the signals and use e.g. epoll_pwait() to temporarily unblock them at opportune moments. Or you can set SA_RESTART when setting up the signal handler if you do not ever wish this behavior.)

For regular files, Linux has a simple solution for you: You'll never get EINTR. It's only a thing for sockets, pipes and other things that can block indefinitely, which a file cannot.

(Except if you're on NFS, in which case you have to choose between two evils depending on whether your file system is mounted intr or nointr :-) )

pjc50 · on Oct 16, 2023

"nointr", last time I used it, meant your program could never be interrupted by NFS errors under any circumstances. Therefore, if the NFS master went away, your program couldn't be killed either and would be stuck in D state forever until you rebooted the whole NFS client machine.

Sesse__ · on Oct 16, 2023

Yes, that's the big evil on that side. :-)

(You don't need to reboot, though; you can “mount -o remount,intr …” to switch states and then kill. I wonder if you can also now actually do kill -9 specifically even on nointr, but I haven't checked.)

prewett · on Oct 16, 2023

In my experience (admittedly from some years ago), you have to be pretty careful about using the terminal when NFS fails, since you can pretty easily lock up your terminal, of which you only have a "fixed" supply. If you're using NFS, chances are good your home directory is on NFS so you need to make sure you don't accidentally stat() something in the current path (which is probably in your home directory). Obviously, starting new terminal sessions once you've locked one up isn't going to happen. But you'll probably waste a couple trying to figure out what's going on. You'll also want to have that mount command written on a piece of paper somewhere (and remember that you have it), since you can't read your notes.txt file in your home directory. You probably can't do a web search, since the browser reads/writes a cache to your home directory. (Maybe lynx or links would work?) Hopefully you didn't add ~/bin or an NFS-mounted /opt or something before the system paths or you'll need to do run everything with the full path. But... a bunch of tool installers like to prepend their directory to your path if you let them fix everything automatically. I have a ~/.gem/ruby/2.3.0/bin early in my path right now, completely didn't realize.

vincent-manis · on Oct 16, 2023

I haven't used NFS in decades; back in the day (we had Sun-3's running SunOS), when an NFS server hung, we got a message like `NFS server foo not responding, still trying' over and over again. In the absence of an admin, all you could do was to login again on a different machine. We called it the `Notwork File System'.

pjmlp · on Oct 17, 2023

The joy of trying to login into UNIX thin terminals with the home directory mounted via NFS and having the network cable with a broken terminator. before ethernet became a thing.

crabbone · on Oct 16, 2023

You kind of put the cart before the horse... Threads are the coping mechanism, that has to cope with signals, not the other way around. Threads exist in the way they are because of the original bad design (which included signals).

Potentially, there could be other ways of dealing with communication, some of them already exist in popular operating systems, s.a. sockets. It's actually funny that you mention one in your problem statement. Erlang-style ports are another possible solution.

Sesse__ · on Oct 16, 2023

Replace “thread” with “process”, then. The basic fact doesn't really change; if you want a clean shutdown on Ctrl-C, you'll need to allow an EINTR-like return from a blocking syscall.

dannymi · on Oct 16, 2023

>The big problem is; if you have a thread reading from a socket, how can you ever interrupt it if you wish to do so? EINTR on signals is really not so bad when you think of it from that angle.

Yeah, but what if the thread is not currently blocked and you want to interrupt it / stop the thread? Unfortunately, it doesn't queue the EINTR for the next blocking function if the thread was NOT blocked at the moment the signal was sent.

meindnoch · on Oct 17, 2023

So what is the proper solution for this in POSIX?

spacechild1 · on Oct 16, 2023

> The big problem is; if you have a thread reading from a socket, how can you ever interrupt it if you wish to do so?

Poll + an eventfd (or another socket). IMO, signals create more problems than they solve.

Sesse__ · on Oct 16, 2023

Well, that's outlawing blocking I/O. Replacing every blocking I/O call with a poll() loop creates significantly more boilerplate than EINTR does…

spacechild1 · on Oct 16, 2023

> Well, that's outlawing blocking I/O

Not really. A blocking poll() call on a single socket (+ eventfd) is still blocking I/O.

BobbyTables2 · on Oct 16, 2023

Hehe, consider something like wanting to log an error message from a signal handler catching SIGFAULT...

Except the standard logger route my isn’t signal safe, nor much of the basic library routines one would want to use!

The only sane solution is to block/mask all signals, perhaps polling for them if really needed. Anything else is on the path to insanity.

CodesInChaos · on Oct 16, 2023

In typical server applications you can use signalfd to handle signals like receiving data from a client instead of polling. I think on older versions of Linux you could use a trick where you write a single byte to a pipe in the signal handler to achieve a similar effect.

https://man7.org/linux/man-pages/man2/signalfd.2.html

Spivak · on Oct 16, 2023

Right which is why you don't actually do much of anything in the signal handler except save that it happened somewhere to be processed by your application later.

https://docs.python.org/3/library/signal.html#execution-of-p...

Here's how Python does it.

skitter · on Oct 16, 2023

Uh, that Examples section calls print inside the signal handler. Python's IO stack is non-reentrant. I hope nobody follows these examples and gets exceptions at runtime because of it.

eesmith · on Oct 16, 2023

"A Python signal handler does not get executed inside the low-level (C) signal handler. Instead, the low-level signal handler sets a flag which tells the virtual machine to execute the corresponding Python signal handler at a later point(for example at the next bytecode instruction)"

The low-level C handler set the flag then returns. Only some time later does the Python run-time call the associated Python handler that you see in the Examples section.

skitter · on Oct 16, 2023

In this case that's before the next bytecode executes. The write syscall gets interrupted and returns EINTR, then cpython checks what signal was caught and executes the signal handler, before trying to do the remaining write:

https://github.com/python/cpython/blob/5f7aba938cf5007b6f954...

Here's even a test for it: https://github.com/python/cpython/blob/5f7aba938cf5007b6f954...

gue5t · on Oct 16, 2023

There are a few ways to handle this design question ("what happens if something needs to interact with a process while it's blocked in a system call?") more or less sanely, and UNIX chooses the least sane one (making userspace deal with the complexity of system calls possibly doing part or none of the work that was requested). Other operating systems might fully transparently guarantee that the system call completes before the process is notified of said interaction, but this isn't compatible with IPC as "lightweight" (unbuffered and without backpressure) as signals.

The Right Thing is to make system call submission atomic and asynchronous, only waiting on completion by explicit choice, and remove signals entirely in favor of buffered message-passing IPC. This is basically the world we're approaching with io_uring and signalfd, except for ugly interaction with coalescing and signal dispositions (see https://ldpreload.com/blog/signalfd-is-useless), and the fact that many syscalls still can't be performed through io_uring.

If UNIX had a better API for process management, people wouldn't see signals as necessary, but that's its own can of worms with its own Linux-specific partial fix (pidfd) and genre of gripe article (e.g. https://news.ycombinator.com/item?id=35264487).

dale_glass · on Oct 16, 2023

Yeah, it's using a modern framework like Qt that wraps around all this nonsense and provides something more comfortable to work with.