> Everything will probably be just fine if you kill-9 something. If a program fa...

dwc · on Jan 28, 2014

Your reply is the real answer to the linked question. SIGTERM serves a real purpose, and almost always it works to end the process cleanly and promptly.

If a process doesn't respond reasonably to SIGTERM, then you should consider removing it from your filesystem.

jerf · on Jan 28, 2014

"If a process doesn't respond reasonably to SIGTERM, then you should consider removing it from your filesystem."

That's an overreaction. I've seen programs refuse to respond to SIGKILL for reasons out of their control, like getting stuck in an NFS transaction or other "unusual" filesystem and the kernel being unable to process the SIGKILL. "Listing a directory" is not exactly a crazy thing to do.

And no wiggling out by saying "well, that's the kernel, not the process", the process never gets a chance to "handle" SIGKILL, so it can't really screw it up, either. Arguably, all such failures are the kernel; the kernel should not expose any sequence of calls that causes SIGKILL to fail, and over time they tend to be fixed (I haven't seen this on my modern Linux machine in a long time, even playing with some funny stuff), but it has happened and will probably continue to happen as new stuff comes out.

TheCraiggers · on Jan 28, 2014

>And no wiggling out by saying "well, that's the kernel, not the process", the process never gets a chance to "handle" SIGKILL, so it can't really screw it up, either.

Technically, you are correct- the process doesn't even know it's been SIGKILL'd. However, there are other things it could do to gracefully handle the scenario upon next start. Is there an already-existing lock file? Prompt for its removal. An already existing PID file? Again, ask what to do. Include tools to fix records that may have been left in an inconsistent state. So on and so forth.

That said, I've never had to do a SIGKILL. However, I still expect my programs to sanely recover from power outages, clumsy interns, RAID failures, and other "acts of god" that may suddenly cause a program to end before it has a chance to clean up after itself. It's part of making robust programs.

FooBarWidget · on Jan 28, 2014

Unfortunately the signals are badly named. Last time I explained to someone the difference between SIGTERM and SIGKILL. He was baffled.

He: You mean SIGKILL isn't the signal for graceful termination? And it isn't sent by default when invoking `kill <PID>`?

Me: Nope. Why do you think that?

He: Well, the command is called `kill`.

Me: ...good point.

majelix · on Jan 28, 2014

It's even worse than that -- kill is just a command that sends signals to a program, some of which happen to (gracefully or not) stop the program. You can even define your own!

Dylan16807 · on Jan 28, 2014

It's the difference between "kill it" and "kill KILL it". :)

takeda · on Jan 28, 2014

If we use analogy it would be like forcing the process to kill itself (http://www.merriam-webster.com/dictionary/terminate) vs killing it (http://www.merriam-webster.com/dictionary/kill).

Xylakant · on Jan 28, 2014

> If a process doesn't respond reasonably to SIGTERM, then you should consider removing it from your filesystem.

What's reasonably? MySQL/InnoDB will start consolidating the journal and buffer pool in preparation for a shutdown. On machines with a large amount of RAM allocated to the buffer pool that will take quite some time. Is that still reasonable?

yebyen · on Jan 28, 2014

Absolutely. Those are healthy parts of a clean shutdown, which is what SIGTERM was really asking for, isn't it?

InnoDB is designed to recover gracefully from a SIGKILL, too. The journal task is "saved" for later. That's why they tell you in DBA school, "don't use MyISAM tables. They're not safe." Because if they received a SIGKILL or power outage at the wrong moment, writes could have been lost. Amiright?

What you don't expect is for a process that receives SIGTERM to fail hard, and require an expensive journal recovery or suffer some unrecoverable data loss as a result.

Xylakant · on Jan 28, 2014

Well, actually: Recovering from a SIGKILL will (or at least used to) take much longer at startup than shutting down with SIGTERM, so it could handle that better. I still agree with you that it's reasonable, but others may hold a different opinion.

yebyen · on Jan 28, 2014

Heh - you're saying that the SIGKILL recovery on start is cheaper than the SIGTERM cleanup on quit. OK. I'll agree, that's strange. One would probably have expected to have to dot and cross the same number of I's and T's either way, and not for SIGKILL recovery to actually be faster.

I didn't know that.

jchrisa · on Jan 28, 2014

This paper measures the time difference between clean and unclean reboot across various systems. Another important point is that many servers never shut down intentionally, making unclean shutdown the norm in lots of deployments.

https://www.usenix.org/legacy/events/hotos03/tech/full_paper...

yebyen · on Jan 28, 2014

Thanks for the hard numbers. That also makes sense.

It should be easy to rationalize that SIGKILL recovery is slower than SIGTERM shutdown for such a database. If your in-memory cache is empty, it won't be providing any speedups, right? Thus you'll need to go back to disk for everything.

Xylakant · on Jan 28, 2014

No, actually I wanted to say the opposite: Killing the process with SIGKILL and then doing recovery at startup is much more expensive than letting the process shut down properly with SIGTERM.

pyre · on Jan 28, 2014

> If a process doesn't respond reasonably to SIGTERM, then you should consider removing it from your filesystem.

I currently have a vim process that is running in the background and not responding to SIGTERM (so I'll probably SIGKILL it). Does this mean I need to find a new editor?

fnordfnordfnord · on Jan 28, 2014

I was also thinking that criteria might prune the filesystem a bit too much. Alternately, if you're a vim hater, "Another good reason to get rid of vim!"

bostonpete · on Jan 28, 2014

EMACS responds to SIGTERM. Just sayin... :-)

pyre · on Jan 29, 2014

Vim responds to SIGTERM while in the foreground, but if I suspend it (e.g. C-z), it does not respond. If I send a SIGTERM to a suspended Vim process, it does respond immediately after I foreground the process.

ballard · on Jan 28, 2014

It might not be that simple as ops folks often have to support specific versions of stuff on apps that use db-specific features. In that case, it might be worth a prof services contract with a db shop.

(I've supported clustered Oracle on AWS handing massive numbers of micropayment transactions and seen weird shit in prod where it "kind-of failed" according to our ops dba. Classes of survivable bugs vary from apply a vendor hotfix to edge cases not worth downtime.)

marcosdumay · on Jan 28, 2014

Should I stop using NFS, or should I remove the Linux kernel from my filesystem?

angersock · on Jan 28, 2014

Yes!

*BSD with ZFS.