Why not kill -9 a process?

tptacek · on Jan 28, 2014

Everything will probably be just fine if you kill-9 something. If a program fails permanently and dramatically when it's kill-9'd, you should "remove it from your filesystem", because it can't handle other, unavoidable abrupt failures either.

agwa · on Jan 28, 2014

> Everything will probably be just fine if you kill-9 something. If a program fails permanently and dramatically when it's kill-9'd, you should "remove it from your filesystem", because it can't handle other, unavoidable abrupt failures either.

There is a lot of middle ground between failing "permanently and dramatically" and shutting down cleanly.

What about a program that can recover after an abrupt termination, but only after a time-consuming recovery from a journal? What about a program that can recover after an abrupt termination, but only after you manually remove a lock file? These are cases where it's not "just fine" to use SIGKILL, but not so bad as to warrant removing the program.

Generally your life will be easier if you don't take the sledgehammer approach to killing processes, and at least try a non-SIGKILL first.

dwc · on Jan 28, 2014

Your reply is the real answer to the linked question. SIGTERM serves a real purpose, and almost always it works to end the process cleanly and promptly.

If a process doesn't respond reasonably to SIGTERM, then you should consider removing it from your filesystem.

jerf · on Jan 28, 2014

"If a process doesn't respond reasonably to SIGTERM, then you should consider removing it from your filesystem."

That's an overreaction. I've seen programs refuse to respond to SIGKILL for reasons out of their control, like getting stuck in an NFS transaction or other "unusual" filesystem and the kernel being unable to process the SIGKILL. "Listing a directory" is not exactly a crazy thing to do.

And no wiggling out by saying "well, that's the kernel, not the process", the process never gets a chance to "handle" SIGKILL, so it can't really screw it up, either. Arguably, all such failures are the kernel; the kernel should not expose any sequence of calls that causes SIGKILL to fail, and over time they tend to be fixed (I haven't seen this on my modern Linux machine in a long time, even playing with some funny stuff), but it has happened and will probably continue to happen as new stuff comes out.

TheCraiggers · on Jan 28, 2014

>And no wiggling out by saying "well, that's the kernel, not the process", the process never gets a chance to "handle" SIGKILL, so it can't really screw it up, either.

Technically, you are correct- the process doesn't even know it's been SIGKILL'd. However, there are other things it could do to gracefully handle the scenario upon next start. Is there an already-existing lock file? Prompt for its removal. An already existing PID file? Again, ask what to do. Include tools to fix records that may have been left in an inconsistent state. So on and so forth.

That said, I've never had to do a SIGKILL. However, I still expect my programs to sanely recover from power outages, clumsy interns, RAID failures, and other "acts of god" that may suddenly cause a program to end before it has a chance to clean up after itself. It's part of making robust programs.

FooBarWidget · on Jan 28, 2014

Unfortunately the signals are badly named. Last time I explained to someone the difference between SIGTERM and SIGKILL. He was baffled.

He: You mean SIGKILL isn't the signal for graceful termination? And it isn't sent by default when invoking `kill <PID>`?

Me: Nope. Why do you think that?

He: Well, the command is called `kill`.

Me: ...good point.

majelix · on Jan 28, 2014

It's even worse than that -- kill is just a command that sends signals to a program, some of which happen to (gracefully or not) stop the program. You can even define your own!

Dylan16807 · on Jan 28, 2014

It's the difference between "kill it" and "kill KILL it". :)

takeda · on Jan 28, 2014

If we use analogy it would be like forcing the process to kill itself (http://www.merriam-webster.com/dictionary/terminate) vs killing it (http://www.merriam-webster.com/dictionary/kill).

Xylakant · on Jan 28, 2014

> If a process doesn't respond reasonably to SIGTERM, then you should consider removing it from your filesystem.

What's reasonably? MySQL/InnoDB will start consolidating the journal and buffer pool in preparation for a shutdown. On machines with a large amount of RAM allocated to the buffer pool that will take quite some time. Is that still reasonable?

yebyen · on Jan 28, 2014

Absolutely. Those are healthy parts of a clean shutdown, which is what SIGTERM was really asking for, isn't it?

InnoDB is designed to recover gracefully from a SIGKILL, too. The journal task is "saved" for later. That's why they tell you in DBA school, "don't use MyISAM tables. They're not safe." Because if they received a SIGKILL or power outage at the wrong moment, writes could have been lost. Amiright?

What you don't expect is for a process that receives SIGTERM to fail hard, and require an expensive journal recovery or suffer some unrecoverable data loss as a result.

Xylakant · on Jan 28, 2014

Well, actually: Recovering from a SIGKILL will (or at least used to) take much longer at startup than shutting down with SIGTERM, so it could handle that better. I still agree with you that it's reasonable, but others may hold a different opinion.

yebyen · on Jan 28, 2014

Heh - you're saying that the SIGKILL recovery on start is cheaper than the SIGTERM cleanup on quit. OK. I'll agree, that's strange. One would probably have expected to have to dot and cross the same number of I's and T's either way, and not for SIGKILL recovery to actually be faster.

I didn't know that.

jchrisa · on Jan 28, 2014

This paper measures the time difference between clean and unclean reboot across various systems. Another important point is that many servers never shut down intentionally, making unclean shutdown the norm in lots of deployments.

https://www.usenix.org/legacy/events/hotos03/tech/full_paper...

yebyen · on Jan 28, 2014

Thanks for the hard numbers. That also makes sense.

It should be easy to rationalize that SIGKILL recovery is slower than SIGTERM shutdown for such a database. If your in-memory cache is empty, it won't be providing any speedups, right? Thus you'll need to go back to disk for everything.

Xylakant · on Jan 28, 2014

No, actually I wanted to say the opposite: Killing the process with SIGKILL and then doing recovery at startup is much more expensive than letting the process shut down properly with SIGTERM.

pyre · on Jan 28, 2014

> If a process doesn't respond reasonably to SIGTERM, then you should consider removing it from your filesystem.

I currently have a vim process that is running in the background and not responding to SIGTERM (so I'll probably SIGKILL it). Does this mean I need to find a new editor?

fnordfnordfnord · on Jan 28, 2014

I was also thinking that criteria might prune the filesystem a bit too much. Alternately, if you're a vim hater, "Another good reason to get rid of vim!"

bostonpete · on Jan 28, 2014

EMACS responds to SIGTERM. Just sayin... :-)

pyre · on Jan 29, 2014

Vim responds to SIGTERM while in the foreground, but if I suspend it (e.g. C-z), it does not respond. If I send a SIGTERM to a suspended Vim process, it does respond immediately after I foreground the process.

ballard · on Jan 28, 2014

It might not be that simple as ops folks often have to support specific versions of stuff on apps that use db-specific features. In that case, it might be worth a prof services contract with a db shop.

(I've supported clustered Oracle on AWS handing massive numbers of micropayment transactions and seen weird shit in prod where it "kind-of failed" according to our ops dba. Classes of survivable bugs vary from apply a vendor hotfix to edge cases not worth downtime.)

marcosdumay · on Jan 28, 2014

Should I stop using NFS, or should I remove the Linux kernel from my filesystem?

angersock · on Jan 28, 2014

Yes!

*BSD with ZFS.

chacham15 · on Jan 28, 2014

I agree for the most part. There are a few exceptions, however, and no-one seems to have noted/listed them. Most notably, when a process is 'kill -9'ed, some shared system resources stay open. Most notably of those are the named semaphores. This will sometimes cause the process to fail when it is reopened. You can use ipcs and ipcrm to remove those.

InclinedPlane · on Jan 28, 2014

One thing I always liked about couchdb, aside from its other merits or faults, is that it was designed to be a crash only service. Meaning that the normal, correct way to shut it down is just an abrupt halt of the process. This means that functionally there's no difference between a normal shutdown and a crash or a killed process. This is quite valuable for a service where data integrity is important.

jeltz · on Jan 28, 2014

This is a feature the tux3 filesystem also has, and I hope it will be merged into Linux when it is stable.

pflanze · on Jan 28, 2014

> If a program fails permanently and dramatically when it's kill-9'd

There are several kinds of bad state that can be left over:

  (1) space leaks (disk, memory (sysv shared memory))
  (2) inconsistent data (data / config files)
  (3) locks/semaphores (used to protect (2) from happening in normal operation)

(We're talking about those that are under the control of the program itself, not the control of another program or the kernel (the latter can and generally will be cleaned up automatically).)

Programs can be written to clean those up when they are restarted, which is the reason for the saying that if some program does not do that, then it's not worth being left installed.

But it takes a certain amount of work writing code to recover from an abandoned lock and the possibly inconsistent state that has been left over, so much so that even some wide-spread system libraries don't do it. Of course you're free to remove those system libraries from your system, but you may have to write a replacement.

Also, in the case of such libraries, it may happen that you lock out other programs (that also happen to use the same libraries) than the one you SIGKILL'ed, and you may not be aware of the source for the issue.

In particular I'm thinking of ALSA. If you SIGKILL a program while it uses [a "hw:" device in] ALSA, it leaves behind a semaphore that prevents other programs from accessing the same device indefinitely (it seems there's no recovery code in the ALSA libraries). This robbed me of quite some time to figure out; when I did, I wrote a utility to clean up the semaphores explicitely[1]. Every now and then it happens that I need to reach out for it.

[1] https://github.com/pflanze/chj-bin/blob/master/alsafix

dilap · on Jan 28, 2014

This is absolutely, technically true, but sometimes it's just so much easier to write something that doesn't handle being kill -9ed gracefully...

hangs head in shame

reidrac · on Jan 28, 2014

Well, as you can't catch -9 (SIGKILL), any task in your program that shouldn't be interrupted... will be interrupted. That's generally a good rule, but programs aren't usually SIGKILLED. It is SIGTERM the one you're supposed to take into account.

quesera · on Jan 28, 2014

Permanent and dramatic failure caused by SIGKILL presents a candidate for deletion, but it depends on your definitions of permanent and dramatic.

It wouldn't be unexpected for a SIGKILLed process to fail to flush a cache, or to exit with persistent state ...inconsistent. That can be permanent (changes lost) and dramatic.

You can wrap operations in transactional overhead to greatly reduce the chance of data loss, but you can't eliminate it entirely without assistance from peers.

kill -9 is the equivalent of unplugging the machine in the middle of operations. It might be fine, and with careful design it might be fine almost every time.

zAy0LfpBZLC8mAC · on Jan 28, 2014

Why should "transactional overhead" not eliminate it entirely?

And why "might be fine almost every time" instead of "will always be fine"?

marcosdumay · on Jan 28, 2014

Insert as many 9s you want in 99.9%, it'll still be a finite amount, so it's not "aways".

zAy0LfpBZLC8mAC · on Jan 28, 2014

That doesn't change when you add peers, though, so it's kinda besides the point. Oh, and a normal shutdown doesn't prevent flipped bits due to cosmic rays either ...

emmelaich · on Jan 28, 2014

I think some mathematicians would disagree with you here :-)

chomp · on Jan 28, 2014

I think that's rather harsh, MySQL/InnoDB can't handle kill -9. Granted, kill-9'ing a database process is bad karma :)

protomyth · on Jan 28, 2014

If a database cannot handle kill -9 then it cannot be considered transactional and shouldn't be used.

marcosdumay · on Jan 28, 2014

Databases must handle termination, that's right. But they may lose data doing that, and there's nothing wrong in that (it's even inevitable).

scott_karana · on Jan 28, 2014

Unless you're talking about filesystems with specific support for atomicity, and databases with custom storage drivers for those filesystems, that's really not fair.

It would be far too easy to kill any program mid-write().

zAy0LfpBZLC8mAC · on Jan 28, 2014

That is what journaling is for. A database that corrupts data when you kill -9 it is garbage. A database has to survive without corruption when power fails unexpectedly, and that's even harsher than kill -9 in terms of what can go wrong if your code is sloppy.

weland · on Jan 28, 2014

> Unless you're talking about filesystems with specific support for atomicity, and databases with custom storage drivers for those filesystems, that's really not fair.

Unless you're talking about a database not intended to be used in production, that's really not fair.

What if the power comes down? The UPS blows up? There's an earthquake and the ceiling comes down? Is it OK for a DBMS to be corrupt its data then, too?

masklinn · on Jan 28, 2014

> Unless you're talking about filesystems with specific support for atomicity, and databases with custom storage drivers for those filesystems, that's really not fair.

Yes it is. The DB should not unrecoverably corrupt itself in case of hard power loss, and a kill -9 is significanly less traumatic than a hard power loss (which includes PSU melting/explosions or UPS going down).

protomyth · on Jan 28, 2014

A database should not be corrupt if a kill -9 is issued and it failed mid write. That is a design criteria that cannot be compromised.

dwc · on Jan 28, 2014

I think you may have reinforced tptaceks's point, actually. Many DBs can handle abrupt termination.

hnriot · on Jan 28, 2014

isn't that the whole point of journaling! Of course database can handle kill -9, they are designed to. That doesn't mean it's friendly to do that because you'll pay a price in redoing journals.

also -9 can leave ipc hanging so some ipcs might be needed.

sp332 · on Jan 28, 2014

Not that the database will get corrupted, but that's a good way to lose data.

jeltz · on Jan 28, 2014

A database should not lose any committed data on `kill -9` in the default configuration. This is why PostgreSQL waits on fsync on the write-ahead log before completing a commit. This can be disabled with synchronous_commit off, in which case you will indeed lose data on a crash.

marcosdumay · on Jan 28, 2014

Normaly, it's not nice to lose uncommited data.

zAy0LfpBZLC8mAC · on Jan 29, 2014

I said it already elsewhere in this thread, but either the writer has other persistent storage, then it should just keep the data and retry later and nothing will be lost, or the writer doesn't have other storage, then you will also lose uncommitted data with a normal shutdown, as you can not even try to commit any data while a data base is down.

Xylakant · on Jan 28, 2014

Actually, it can. Killing it with -9 prevents the process from writing the buffer pool etc. to disk, so that the the server needs to recover that state from the journal/log files. No data will be lost though. So you're basically trading time at shutdown vs. time at start-up.

craigching · on Jan 28, 2014

> So you're basically trading time at shutdown vs. time at start-up.

I'm being a bit pedantic, but trading time at shutdown vs. time at start-up + potential data loss (if the journal was in the middle of being written while the -9 was sent, you probably lose the last journal entry).

zAy0LfpBZLC8mAC · on Jan 28, 2014

If that counts as data loss, then a "normal shutdown" also causes data loss. When the journal write gets interrupted, the database client won't get a success return, so it should not delete the data it was trying to commit, and retry the transaction at a later time, so nothing is lost at all. When the client is some entity without reliable persistent state that can not store the data in order to commit it later, that entity will be unable to connect to the database once the "normal shutdown" is completed, and thus will face the exact same problem, as the journal entry is "lost" before the write even starts.

craigching · on Jan 29, 2014

> If that counts as data loss, then a "normal shutdown" also causes data loss.

I disagree. If a journal write was in progress on SIGTERM, you either fail or complete the write, returning the result to the client. In either event, you should end up with a clean journal on SIGTERM, assuming your implementation was sane (i.e. how I'd write it). If the client chooses to ignore the failure, that's not on the server and that's the whole point.

A normal shutdown should never require data loss. Somehow you're conflating SIGTERM with SIGKILL and I'm not understanding why.

zAy0LfpBZLC8mAC · on Jan 30, 2014

A client that relies on receiving an explicit failure response is broken. When the server is killed by the OOM killer or when power of the server fails or the network connection between the client and the server fails (for too long) while the commit is in progress, the client also doesn't get an explicit response, and thus also doesn't know whether the commit succeeded or not, so it has to deal with that case anyhow - and if it doesn't lose data in that case, it won't lose data on SIGKILL either.

And the whole point of a journal is that it doesn't need to be clean. If your database requires the journal to be clean on startup as a condition for its correctness, then the journal is useless, it could just as well just require the data itself to be consistent. The only reason why there is a journal is so that correctness is not affected when writes are interrupted as any point.

Also, for that matter, a SIGTERM might be able to guarantee that the journal is clean, but it would be highly broken if it tried to guarantee that the client gets to know about the result, as that could take an arbitrary amount of time that might be dependent on the behaviour of remote systems, which would be terrible shutdown behaviour indeed.

Really, the only reason why a database should even try and catch SIGTERM is when checkpointing on shutdown is a lot cheaper than log recovery on startup - other than that, it only makes the code more complicated without providing any benefits.

icedchai · on Jan 28, 2014

Of course it can. Have you ever tried it?

randomdrake · on Jan 28, 2014

I remember learning about kill as a kid, then kill -9 in college, and after college re-learning it via Monzy[1].

I had sort of forgotten the importance of the signal argument. Such an incredibly powerful command that is nothing more than a representation of a very simple decision or architecture [2]:

     Some of the more commonly used signals:
     1       HUP (hang up)
     2       INT (interrupt)
     3       QUIT (quit)
     6       ABRT (abort)
     9       KILL (non-catchable, non-ignorable kill)
     14      ALRM (alarm clock)
     15      TERM (software termination signal)

[1] - http://www.youtube.com/watch?v=Fow7iUaKrq4

[2] - man kill

ballard · on Jan 28, 2014

kill -0 <pid> to test if a process is alive.

kill -9 -1 to send a KILL to every process.

yiedyie · on Jan 28, 2014

You're right, nobody mentioned if the process is <defunct>: http://unix.stackexchange.com/a/111316/22558

dhimes · on Jan 28, 2014

I went to kill -L and it didn't work! Linux Mint 15 (ubuntu saucy/debian derivative)

warnhardcode · on Jan 28, 2014

Heh I went to try it and sure enough it was in the manpage but didn't work on the commandline because.....manpage was for /bin/kill but from the commandline I was using the builtin kill. WOW!

  % kill -L
  kill: unknown signal: SIGV
  kill: type kill -l for a list of signals

  % which kill
  kill: shell built-in command

  % /bin/kill -L
   1 HUP      2 INT      3 QUIT     4 ILL      5 TRAP     6 ABRT     7 BUS
   8 FPE      9 KILL    10 USR1    11 SEGV    12 USR2    13 PIPE    14 ALRM
  15 TERM    16 STKFLT  17 CHLD    18 CONT    19 STOP    20 TSTP    21 TTIN
  22 TTOU    23 URG     24 XCPU    25 XFSZ    26 VTALRM  27 PROF    28 WINCH
  29 POLL    30 PWR     31 SYS

GrinningFool · on Jan 28, 2014

Thanks for this, I didn't realize it.

You can disable kill (or any built-in)

Under zsh:

    disable kill

Under bash:

    enable -n kill

dhimes · on Jan 28, 2014

Agree with grinningfool- thank you!

EDIT- at least it looks like the calls are the same.

toyg · on Jan 28, 2014

I wonder if there is space to do with *nix signals what REST did with HTTP verbs.

vincentkriek · on Jan 28, 2014

Monzy should keep making music. I want new tracks!

colechristensen · on Jan 28, 2014

Several of the answers on stackexchange and comments here are absurd.

SIGKILL shouldn't be your first resort but sometimes is necessary as a last resort. If the sky falls and you can't deal with it, there's something wrong in a lot of places which have nothing to do with signals.

ballard · on Jan 28, 2014

Exactly. It should be the nuclear option of last resort.

colechristensen · on Jan 28, 2014

But 'nuclear option' attaches too much meaning, if you're in the position, you'll run into plenty of circumstances where SIGKILL is necessary. It's a perfectly fine tool to use and deserves no extremist opinions.

mtdewcmu · on Jan 28, 2014

Right... In Windows and OSX you have to do the same thing fairly often. It's called "force quit" or something like that. Otherwise you'd sometimes be waiting an eternity for programs to get themselves unstuck. If a program leaves a messy state behind when you `kill -9` it, it will automatically clean up the mess the next time it runs. If it doesn't, then don't bother using it, because it's extremely poorly written. (If there are orphaned child processes left behind, I kill them manually.)

Usually it's just an obviously good idea to send a milder signal first, because it's less likely to leave orphaned child processes and `kill 345` is just plain easier to type than `kill -9 345`. Also, trying a TERM signal first gives you some feedback on how fubar the process really is.

Pxtl · on Jan 28, 2014

Yeah, Windows has "Force Quit" baked into their regular shutdown process now - if any program is blocking the shutdown for more than a second or three, the user will get the option to kill everything. Under windows, there is now no excuse not to properly handle a forced abrupt termination, because the layman users will do it.

protomyth · on Jan 28, 2014

I always looked at it like: I have tried to stop / shutdown this program in the documented way, then I tried a kill -15, but this thing won't go so kill -9. I then think very hard about why I allow such a thing on a system.

ltbarcly3 · on Jan 28, 2014

Sometimes you have to kill -9, but you shouldn't do it unless you tried other signals and they didn't work, and you know what the consequences are.

For example, postgresql forks a process for every connection. What you may not know is, if you kill one of these processes, it needs to clean up it's use of the shared memory pool. If you kill -9 any of postgresql's child processes, the other processes will see that a peer died uncleanly, and postgres will just shut down rather than risk corruption.

Oh, the things we learn the hard way.

gaius · on Jan 28, 2014

This is a Postgres thing tho', you can kill Oracle shadow processes willy-nilly with no consequences. Oracle has another process PMON that will clear up after them. If you kill PMON (or SMON) however the DB will shut down. However no data will be lost; at one company I worked at kill -9 on SMON was the normal way to shut the production DB down!

aidenn0 · on Jan 29, 2014

kill -9 on smon is used likely because I've seen "shutdown immediate" take over an hour to complete...

dscrd · on Jan 28, 2014

Some databases allow local shared memory connections. So, if you kill -9 a regular client that connects to the database that way, the whole database goes down. Fun times!

kllrnohj · on Jan 28, 2014

The Linux OOM killer (used heavily by Android as part of normal operation) does a SIGKILL. I believe iOS does this as well.

So pretty sure this is totally fine and not a problem in the slightest, seeing as some billion devices or so are doing this on a daily basis.

tjohns · on Jan 28, 2014

Android's OOM killer won't run until after an activity's onPause() or onStop() method has been called, which gives applications a chance to save their state. A foreground activity is typically considered unkillable.

Background services have weaker lifecycle guarantees, but the system can be asked to automatically restart your service and re-delivery any Intents that were being processed.

It's not like the system is regularly killing apps without any recovery options. Dealing with these lifecycle events is a key part of Android development, so apps are designed to deal with them.

See: http://developer.android.com/reference/android/app/Activity.... http://developer.android.com/reference/android/app/Service.h...

kllrnohj · on Jan 28, 2014

Process is considered killable after onPause, there's no requirement that onStop is called. And, in practice, onStop/onDestroy are rarely called outside of an activity calling finish() on itself.

But that call only gives that activity a chance to save state, not the process as a whole. You don't run around stopping all your services and such in onPause(). There is no SIGTERM equivalent where you can go around doing actual cleanup work for the entire process.

As for Services you'll note it's considered killable at any given point, there's largely no guarantees other than it won't be killed in the middle of executing code in onStart/onStop. But during the bulk of its actual work it's totally up for random killing.

And fwiw foreground processes are totally considered killable. They are the last in the queue, yes, but they are still in the queue.

takeda · on Jan 28, 2014

The difference between SIGKILL (is Android actually using SIGKILL?) and onPause() then SIGKILL is that the process still has time to save state (the most important part) and the code itself is not resumed until onResume() is called.

On standard UNIX there's no onPause() or anything similar, so the process cannot react to this in any way.

kllrnohj · on Jan 28, 2014

The state save is important in terms of being able to rebuild the UI quickly. It's not important in the context of "heavy" resources like files, sockets, etc... None of those are ever cleaned up in onPause. The worst thing that will happen if you SIGKILL an Android app without calling onPause first is that the next time it's launched it won't resume from where you left off, it will be as if you rebooted the device.

Also to be clear the onPause and SIGKILL are not tied together. You could get an onPause and it be minutes or hours or even days before you get SIGKILL'd, during which you are completely free to keep running code in the background.

And depending on how your process was started there might not even have been an Activity to be onPause'd in the first place. Consider an app that started doing work in response to a broadcast or content provider query.

gilgoomesh · on Jan 28, 2014

iOS may use SIGKILL when out of memory or you're killed in the background (in most cases it sends a memory warning or -[NSApplication applicationWillTerminate:] message first and gives up to 5 seconds before SIGKILL).

You do need to be a little careful as an app writer. All your file writes should be atomic (either SQLite/CoreData or using the atomic write methods in CoreFoundation and Objective-C) or you must be able to detect and delete partially files and recover in-situ.

kllrnohj · on Jan 28, 2014

> All your file writes should be atomic (either SQLite/CoreData or using the atomic write methods in CoreFoundation and Objective-C) or you must be able to detect and delete partially files and recover in-situ.

You should be doing that regardless, there's nothing special about SIGKILL in this regard. Dirty pages will still get flushed to disk, so SIGKILL is safer than, say, sudden power loss. Which isn't exactly unheard of on battery powered devices, after all.

lazyant · on Jan 28, 2014

I wouldn't say it's totally fine when oom kills mysqld and corrupts tables for example when a shutdown as alternative would have been better. Unfortunately oom-killer is not that smart.

kllrnohj · on Jan 28, 2014

Then don't use mysqld as it's obviously broken. Even SQLite doesn't have that problem.

jeltz · on Jan 28, 2014

I really hope you are wrong about killing mysqld corrupting tables, because if it does then I would not want mysql near my data. Databases should use a journal for crash safety.

lazyant · on Jan 28, 2014

Sometimes, not always but it does happen, I've seen it many times

nsxwolf · on Jan 28, 2014

Sums it up nicely for me:

"I use kill -9 in much the same way that I throw kitchen implements in the dishwasher: if a kitchen implement is ruined by the dishwasher then I don't want it."

TeMPOraL · on Jan 28, 2014

Somebody please write a bash script `murder' that sends 15, 2, 1 and then 9, with slighth delay between each signal. I'd do it myself, but I'm not very bash-proficient and have to run to work now ;).

ordinary · on Jan 28, 2014

I've had a version of this for years. Your post made me take another look at the code. I had to clean it up a little, but here it is:

https://gist.github.com/anonymous/32b1e619bc9e7fbe0eaa

(The cleanup might have introduces some subtle bugs, but a quick test showed no issues. Let me know if you find some!)

TeMPOraL · on Jan 28, 2014

Sweet, thanks!

    temporal@legion~/r/mintia> murder 4421                                       0 50/50℃  13:56:10 28.01.2014
    murder: Killed process with PID 4421 with signal TERM (15).

Works like charm :).

Dylan16807 · on Jan 28, 2014

So there are two answers, one saying to try without -9 first (which is obvious), and the other suggesting you uninstall any program where -9 is necessary.

Was this submitted to HN for more opinions? So people could see the disagreeing answers? Something else?

eru · on Jan 28, 2014

> [...] and the other suggesting you uninstall any program where -9 is necessary.

And vice versa, remove any program where -9 may cause damage.

jen_h · on Jan 28, 2014

When you care enough to send the very best, kill -15. When you want evidence (core dump), kill -11. For all other (most) purposes, kill -9 never hurt anyone but a process you wanted shot down anyway.

MichaelBurge · on Jan 28, 2014

I can give a real practical reason from experience - I was writing a script that needed to pull data from the database. I noticed I was doing something stupid in the SQL query, so I wanted to kill it and rerun it with the change.

I kill'd the script running it, and then kill -9'd it when that didn't work. Two weeks later someone asked about my query that was still running on the database.

And now I'm the one who warns people not to kill -9 scripts without understanding why it's stuck and how to clean it up properly.

hcarvalhoalves · on Jan 28, 2014

You will want to SIGKILL when you need to regain control, and/or when the process is not fundamental. E.g. some ancillary script stuck at 100% CPU or making the server swap crazy that isn't responding to SIGTERM. Anything more fundamental is likely to respond well to SIGTERM or not have the CPU/swapping problem in the first place.

coryfklein · on Jan 28, 2014

If you're not a server admin, kill -9 probably won't mess up your own machine in a significant way. I'm by no means a veteran, but I've used Linux as my primary workstation for 4 years now and haven't yet had reason to regret using kill -9. And given the number of times I have used it, I prefer a single kill -9 as opposed to 2-3 more kills beforehand.

jeza · on Jan 28, 2014

This just about sums it up: "Don't use kill -9. Don't bring out the combine harvester just to tidy up the flower pot." (from this answer http://unix.stackexchange.com/a/8927)

ballard · on Jan 28, 2014

Best to wait sufficiently if possible, because what's happening might just be important to somebody.

If nothing else works, kill -9 -<master pid> to kill the whole process group, otherwise detached processes owned by init could get messy.

dhimes · on Jan 28, 2014

According to man 7 signal, killpg works for this

teddyh · on Jan 28, 2014

See also “What to do when Ctrl + C can't kill a process?” (https://superuser.com/questions/243460)

jdubs · on Jan 28, 2014

kill = send a signal.

By far the easiest question to weed out inexperienced linux users.

ababab · on Jan 28, 2014

Shouldn't that explanation be by followed by 'in order to request the termination of the process'?

habosa · on Jan 28, 2014

Not always. The kill system call can be used to send any signal, it's just got a name that implies you're sending something like SIGKILL or SIGTERM. I have written C programs that use kill for harmless inter-process communication.

eru · on Jan 28, 2014

SIGUSER is quite common for that. Or SIGHUP.

brokenparser · on Jan 28, 2014

By convention, SIGHUP is used by daemons to signal a change of configuration.

beedogs · on Jan 28, 2014

Not necessarily. You can send the CONTinue signal to a process using the kill command. This will not cause the process to terminate.

twinge · on Jan 28, 2014

Or use "kill -0 <pid>" to check if a signal can be sent/the pid exists.

rahimnathwani · on Jan 28, 2014

Or use kill -USR1 <pid> to check the progress of dd

DHowett · on Jan 28, 2014

Only if you've enabled it. If you've not, the process will terminate unceremoniously (in a mechanism borne entirely out of hatred for users.)

This is my favourite part of dd: The "will I receive a status report or will I terminate my long-running copy?" gamble.

jsight · on Jan 28, 2014

Is that a BSD thing? I've never had to enable it on Linux to avoid killing the process.

rahimnathwani · on Jan 28, 2014

Me neither, on debian.

Freaky · on Jan 28, 2014

It's SIGINFO on the BSD's, and works with just about any process (though obviously not everything has a customized handler). Also handily mapped to Ctrl-T.

brokenparser · on Jan 28, 2014

No because only SIGQUIT, SIGABRT, SIGKILL, SIGTERM and often SIGHUP are supposed to do that. All of the other signals have wildly varying meanings. See man 7 signal: http://unixhelp.ed.ac.uk/CGI/man-cgi?signal+7

drivers99 · on Jan 28, 2014

SIGHUP is also used to tell some certain daemons to re-read their configuration files without quitting.

nailer · on Jan 28, 2014

You can also cancel (sigint, eg ctrl c) processes via kill.

Or use sigchld to a parent to get rid of zombie children.

dschiptsov · on Jan 28, 2014

There are two signals which cannot be intercepted and handled: SIGKILL and SIGSTOP.

a3voices · on Jan 28, 2014

Practically speaking, things almost never go wrong when you kill -9 a process, as opposed to a more clean way. Also, time is money.

crimsonalucard · on Jan 28, 2014

You must never use multiprocessing. If you're application spawns children processes, kill -9 will leave all the children orphaned.

It is better for the parent to catch a TERM and relay that signal to all the children. I find this to be a practical and typical use case...

zAy0LfpBZLC8mAC · on Jan 28, 2014

If your application's child processes are orhpaned when the parent dies unexpectedly (for more than a few milliseconds), that is a bug in your program.

bnegreve · on Jan 28, 2014

Not necessarily, processes spawned by bash don't necessarily have to die if you kill bash. For example if you use disown/nohup .. etc.

zAy0LfpBZLC8mAC · on Jan 28, 2014

That's not what is normally meant by "multiprocessing" - multiprocessing is when your application forks multiple processes of itself in order to get some concurrency/parallelism, and those forked processes should monitor their parent and exit immediately when the parent disappears.

bnegreve · on Jan 29, 2014

True, but that's still a good reason to avoid SIGKILLing your processes.