Bug 1202858 – Restarting squid results in deleting all files in hard-drive

kevan · on March 24, 2015

I love how deadpan the bug report is:

    Actual results:
    All files are deleted on the machine.

    Expected results:
    Squid is restarted.

Not many details yet but it sounds similar to the Steam bug [0] from last year.

[0] https://github.com/valvesoftware/steam-for-linux/issues/3671

stevenspasbo · on March 24, 2015

I started laughing at the lines

    Stopping squid: ................[  OK  ]
    rm: cannot remove `/boot': Device or resource busy

blcknight · on March 24, 2015

Except the Valve bug passed their QE process if any and got out into the wild. This is for an unreleased version of RHEL, and was caught....

TazeTSchnitzel · on March 24, 2015

QE? Is that a misspelling of QA, or is there another meaning beyond Quantitative Easing?

blcknight · on March 24, 2015

Quality engineering. It's Red Hat's term, as well as a few other software companies, but its QA.

philk10 · on March 24, 2015

Though QA is Quality Assurance when really it's QC )Quality Control) that should have caught the error. QA puts processes into place that means there is a QC that can catch this. Sorry, going off-topic but as a tester I dislike being told to 'QA this'

TazeTSchnitzel · on March 24, 2015

Oh I see. Thanks.

ams6110 · on March 24, 2015

Note that:

  At the time of this writing, RHEL 6.7 is still pre-beta
  and this bug was found in an *UNRELEASED* update to squid.

Bad enough, but not like it's out in production.

viraptor · on March 24, 2015

I would bet on the issue being in the init script itself rather than squid. (I'm assuming squid doesn't run as root by default in rhel) If that's true then it's another point for more sane process managers (upstart/supervisord/systemd/...)

kevan · on March 24, 2015

Agreed, I should've elaborated. All it takes is something like this in the init script without checking if the variable is empty:

    rm -rf "$STEAMROOT/"*

lpsz · on March 24, 2015

And this is why it is important to write something like

    set -eu

on top of your bash scripts -- execution will stop on errors (non-zero retvals) and on undefined variables.

cellularmitosis · on March 24, 2015

I also include set -o pipefail (exit if ANY command in a pipeline fails). Had to get bitten and waste an hour before that became a habit.

set -e and set -o pipefail really should have been the default, rather than an opt-in.

nwalfield · on March 24, 2015

set -o pipefail makes common idioms a pain. Consider using head, which simply exits after it has read a few lines. In this case, the input process gets a SIGPIPE and exits with a non-zero exit code:

Consider /tmp/test.sh:

  set -o pipefail
  yes foo | head

  $ bash /tmp/test.sh >/dev/null
  $ echo $?
  141

pixelbeat · on March 24, 2015

That's a bug IMHO which I reported at http://lists.gnu.org/archive/html/bug-bash/2015-02/msg00052....

I've collated other mishandling of closed pipes at: http://www.pixelbeat.org/programming/sigpipe_handling.html

quotemstr · on March 24, 2015

For a while now, I've thought we should change SIGPIPE's SIG_DFL action to _exit(0).

darklajid · on March 24, 2015

That's not as simple or clear as you make it sound though.

http://mywiki.wooledge.org/BashFAQ/105

disagrees and refers to GreyCat's preference not to use -e at the bottom of the list of 'complications'.

un1xl0ser · on March 24, 2015

From the same page:"rking's personal recommendation is to go ahead and use set -e, but beware of possible gotchas. It has useful semantics, so to exclude it from the toolbox is to give into FUD."

You can use set -e, and turn it off (set +e) for code blocks and things that are problematic. He could also add '|| true', and you may be able to use colon to avoid point problems without turning everything off. These are edge cases and you can easily work around them if you an advanced user.

If you are not an advanced user then you should certainly use -e.

un1xl0ser · on March 24, 2015

  $ diff -u /tmp/a /tmp/b
  --- /tmp/a	2015-03-24 08:33:00.021919797 -0400
  +++ /tmp/b	2015-03-24 08:33:05.629963015 -0400
  @@ -1,5 +1,5 @@
   #!/usr/bin/env bash
   set -e
   i=0
  -let i++
  +let i++ || true
   echo "i is $i"
  $ /tmp/a
  $ /tmp/b
  i is 1
  $

jiffytick · on March 24, 2015

or check the variable before using it, like any other programming language:

[[ "$VAR" ]] && rm -rf "$VAR/*"

I think most of these issues stem from the fact that most developers that write shell scripts don't actually understand what they're doing, treating the script as a necessary annoyance rather than a component of the software.

TheDong · on March 24, 2015

If anyone understands shell scripts, it would be people writing init scripts at Red Hat :)

Anyways, that is not anything like other programming languages. Checking in that way is error prone and not really an improvement (nor equivalent to set -o).

  [[ "$DAEMON_PATH" ]] && rm -rf "$DEAMON_PATH/*"

See what I did there? It's an rm -rf /* bug because "checking variables" is not the answer.

In other programming languages, if an identifier is mis-typed things will blow up. E.g., in ruby if I write:

  daemon_path=1; if daemon_path; puts deamon_path; end

I get "NameError: undefined local variable or method `deamon_path`"

These issues do not always stem from bad developers. Bash's defaults are not safe in many ways and saying "people should just check the variable" isn't helpful here.

bishc · on March 24, 2015

Shameless plug for my language "bish" (compiles to bash) which aims to solve many of these annoyances with shell scripting: https://github.com/tdenniston/bish

pwg · on March 24, 2015

Bash has the ability to also flag use of an undefined variable an error, it is just not on by default.

set -u

Man page quote: "Treat unset variables and parameters other than the special parameters "@" and "*" as an error when performing parameter expansion. If expansion is attempted on an unset variable or parameter, the shell prints an error message, and, if not interactive, exits with a non-zero status."

woah · on March 24, 2015

Elephant in the room- shell is a bizarre language

Someone1234 · on March 24, 2015

Yeah, everyone always loves to shit on BAT (which is fair, it is terrible) and VBS (which is slightly less fair) but inspite of how many problems Bash has (least of all the massive security issue last year), it gets off almost scot free.

These bugs are indicative of Bash's design problems. Why is it used for init scripts? And don't even get me started on how Bash interprets filenames as part of the arguments list when using * (e.g. file named "-rf").

Say what you will about Powershell, but having a typed language that can throw a null exception is useful for bugs like these. The filename isn't relevant, and a null name on a delete won't try to clear out of the OS (just throw).

rodgerd · on March 24, 2015

> it gets off almost scot free.

Not just scot free - during the Great systemd War of 2014 is was a talking point for the antis that using anything other than the pure, reliable simplicity of shell for service management was MADNESS!

digi_owl · on March 25, 2015

I don't think that was the argument, as much as it was that if a shell script fouled up it was easier to get in and do field repairs because it was interpreted rather than compiled.

sjolsen · on March 24, 2015

>And don't even get me started on how Bash interprets filenames as part of the arguments list when using * (e.g. file named "-rf")

That's not Bash. That's just... programs in Unix. Such is life when everything is stringly typed.

emmelaich · on March 24, 2015

I think a better alternative is something like

    rm -r "${VAR:-var_is_not_set_so_please_fix_this_script}"

which substitutes the var_is_... if VAR is not set.

BTW, I hate hate hate -f. It has two meanings: 1. 'force' the removal 2. ignore any error

I've seen an instance of this sort of bug in my sysadmin career that I remember. It was a Solaris patch which wiped a chunk of the system.

rjcz · on March 24, 2015

No, this will remove 'var_is_not_set_so_please_fix_this_script' file if one exists.

If you're suggesting using parameter expansion, at least suggest the correct one (i.e. one that will give a meaningful error message):

    ${parameter:?word}

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3...

emmelaich · on March 24, 2015

Yep, better. Thanks.

mraison · on March 24, 2015

> like any other programming language

some real-world programming languages don't have undefined variables :)

coldtea · on March 24, 2015

Since the variable he shows is used in a string interpolation, it doesn't have to be undefined.

Being the emptys string "" would work just as well.

guardian5x · on March 24, 2015

I wonder why set -eu is not the default setting.

blueskin_ · on March 24, 2015

1. open bash

2. set -e

3. type an invalid command or run one that returns non-zero

4. "crap, where did my shell go?"

twic · on March 24, 2015

It could be the default for non-interactive shells without causing this problem. Or we could have a more nuanced rule, where -e means "stop executing the current sequence of commands as soon as there is an error", where a "sequence of commands" is a single line in an interactive shell (so "false; whoami" would print nothing), or the entire file in a script.

The real answer is that this has not been the default in the time between shells being invented and this comment being posted, and so the squillions of lines of shell script out there in the wild keeping the world turning have not been written with this in mind. Making it the default now would break a lot of things.

With the benefit of hindsight, though, i would say that yes, this should have been the default in scripts. Oh well.

_hyn3 · on March 24, 2015

There are lots and lots of these 'nuanced rules'.

http://mywiki.wooledge.org/BashFAQ/105

anh79 · on March 24, 2015

set -u is good. set -e requires to change a lot of code. See for example

https://github.com/icy/bash-coding-style#set--e

DSMan195276 · on March 24, 2015

That's not completely true. At least with the GNU tools, 'rm' won't delete the root directory unless you specifically give it the '--no-preserve-root' flag. Since that flag has no use outside of deleting root, it's unlikely it has the flag on it. With that in mind the script must do some type of manual deleting for some reason.

viraptor · on March 24, 2015

I believe that "--preserve-root" applies only to / itself. That means `rm -rf /*` will expand to `rm -rf /bin /dev /etc /lib ...` and delete all anyway.

DSMan195276 · on March 24, 2015

That's accurate. `rm -rf /* ` will still work to delete everything. But that said, `rm -rf "$STREAMROOT/"` can't ever expand to that, and more-over since the expansions in double-quotes it won't be subject to path expansion by bash. So even "/* /", which would normally expand into "/bin/ /dev/ /etc/ ..." won't. You can see what I mean yourself, just use echo:

    `echo /* `: /bin /dev /etc /lib ...

    `echo /*/`: /bin/ /dev/ /etc/ /lib/ ...

    `echo "/*"`: /*

    `echo "/*/"`: /*/

If you try it with `ls`, you'll find that `ls "/* "` results in `ls: "/* ": no such file or directory`.

Edit: Formatting.

pskocik · on March 24, 2015

The original example was `rm -rf "$STREAMROOT/"*`, though (the asterisk being out of the double quotes.) Now that glob will expand.

DSMan195276 · on March 24, 2015

Ah, my apologies, I think that was HackerNew's markup at work. The '* ' wasn't there when I looked before.

mirkules · on March 24, 2015

This just happened to my coworker today. I'm sitting behind him telling him which commands to type (he's new to Linux...) when suddenly he jumps the gun and pushes enter just as I say "slash". My heart nearly stopped. I didn't even know preserve-root existed (plus I always iterate not to log in as root). It was a snapshotted vm but we still would have lost the day's work.

arthurcolle · on March 24, 2015

Coworker - intern? :P

jhallenworld · on March 24, 2015

It should optimize for this case and run mke2fs instead.

kevan · on March 24, 2015

In the original Steam bug the asterisk is outside the double-quotes and path expansion definitely applies.

a13xb · on March 24, 2015

I feel like it would be a frighteningly common bug. I remember one like this from 2011 [1]. Install/packaging/utility scripts usually do not get as much attention and testing as the application code itself.

[1] https://github.com/MrMEEE/bumblebee-Old-and-abbandoned/issue...

userbinator · on March 24, 2015

I'd say the fact that these bugs only very occasionally happen - relative to the huge number of shell scripts out there that are being executed every day - that it's not really "frighteningly common". You only hear about the ones that fail.

MichaelGG · on March 24, 2015

By the same logic, memory safety issues only happen rarely, right? Most programs/scripts are going to be tested if part of a distribution and such errors removed. But without polling people it'd be hard to know of the many times this kinda thing messed things up. I personally wiped out a production DB due to expanding an unset variable (fortunately immediately after taking a backup.

SwellJoe · on March 24, 2015

This is, as the bug notes, a regression, and I'm guessing you're right about it being in the initscript (I'm pretty sure). I used to be a very heavy Squid user and Squid developer and I remember a very similar bug many years ago. It was in the cache_dir initialization code. It would read the configuration file, parse out the cache_dir lines, and if the directories didn't exist it would create them as part of the startup.

There were some circumstances where if there was no cache_dir line configured, or if the cache_dir was a link or something, the details are very sketchy in my mind after so much time, but it would end up destroying /.

I'm guessing this is of that same nature.

4ad · on March 24, 2015

RHEL uses systemd.

bonzini · on March 24, 2015

This is RHEL6.7, it doesn't use systemd.

4ad · on March 24, 2015

Oh, I see.

Xylakant · on March 24, 2015

Specifically, this bug would not have happened with systemd since systemd does not leave the handling of pid-files and pid-dirs to shell-scripts.

bandrami · on March 24, 2015

No, but if an analogous bug happened (systemd forgot to set an internal squidroot variable before clearing the squidroot, for instance), it would be much, much harder to figure out what was going on. Which is really what everybody's complaints boil down to.

marcosdumay · on March 24, 2015

That normally does not happen when handling pid-dirs, simply because those are standard and can be handled with standard tools.

Every time I've seen such a bug (honestly, not many), it was created when cleaning a temporary dir.

blueskin_ · on March 24, 2015

"systemd" and "sane" only ever go in the same sentence as "sane people don't use systemd".

It looks like a bug in the init script; runnign it as squid's user wouldn't have triggered destroying the whole filesystem; likely just squid's config and anything under its /var.

viraptor · on March 24, 2015

I'll be the first to call out systemd for a lot of things, but not its core init idea. It's the same as daemontools, upstart, supervisord, and others do. Implementation is very different of course, but the idea is common - you run/kill services, not start/stop them. That's the reason we can leave the ugly and error-prone init scripts behind.

koenigdavidmj · on March 24, 2015

> It looks like a bug in the init script

Which is what happens when you have every daemon writing their own PID handling code, running as root, in a language whose interpolation rules nobody really understands.

digi_owl · on March 25, 2015

A legacy of sysv rather than having shell script handle init.

It is quite possible to have the script for PID handling be written once, and imported as needed.

jdub · on March 24, 2015

"systemd" and "sane" only ever go in the same sentence as "sane people don't use systemd".

It looks like a bug in the init script...

Ha ha ha ha ha ha ha.

MichaelGG · on March 24, 2015

I hear people complaining, but why has every distro picked it up then? If it's so insane, why are these people all converging on it?

digi_owl · on March 25, 2015

Devs love it (in particular web service/app/buzzword-off-the-day devs), admins (at least those not in charge of "cattle") loath it.

This because what it provides it rapid spin up of containers and VMs, while everything talks to each other via APIs and DBUS.

But this rapidity also leads to issues with field repairs and debugging.

"Everyone" is adopting it because the Linux money is in web servers/services.

jedisct1 · on March 24, 2015

My good old trick to mitigate that is:

    touch /-@

I also always do it in my home directory:

    touch ~/-@

That's the first thing I do on a new host.

When accidentally running rm -f *, the command expands to -@ first, which is not a valid option and makes the command fail before doing any harm

    rm: illegal option -- @
    usage: rm [-f | -i] [-dPRrvW] file ...

ulrikrasmussen · on March 24, 2015

While this is a really nice hack, stuff like this is also the reason I feel really uneasy when writing shell scripts. What works now may suddenly break in the future due to inadequate escaping of filenames.

JoshTriplett · on March 24, 2015

That won't actually help with rm -rf /* , only with rm -rf * in / or $HOME.

hk__2 · on March 24, 2015

Would it work with `touch "/ -@"` (not the space)?

a3_nm · on March 24, 2015

No, it doesn't work. "./*" expands to "./ -@" as a single field, which rm has no problems with. (Note that, however, this is still the globbing of the shell, as far as I understand.)

iso8859-1 · on March 24, 2015

how would changing the filename fix it? it's a hack relying on the globbing of the shell. if you're not using the globbing, the hack can never help you.

Zenst · on March 24, 2015

Very handy trick. More so if no root access, myself I preer to rename rm and drop a shell wrapper in place and can be simply a case of changing all passed "" into "Sorry_Dave_I_can_not_allow_you_to_do_that" (or words to that effect) and all "-AsteRISKdeleteALL" into "-" and then that modified input is passed onto rm. But can adjust how and add rules to taste.

That way the pain of having to type AsteRISKdeleteALL instead of * for rm events offsets any anxiety by far.

You can also catch the rm and mv to a difectory with quota's you can call a recycle bin, some low end attached storage can be fine as well as not many situations when your wildcard deleting with a time factor. Can accommodate this in your own skulker to clean up in a more organised way overall in a timely manner. and and scripts you can path to the real rm command if needs be, last time I called it P45Generator, but not the finest for readability in any such scripts.

lambeosaurus · on March 24, 2015

As someone who uses the commandline a lot but isn't exactly a wizard, why does this work?

lsb · on March 24, 2015

globbing expression expand to all the files that match.

if a directory has 1.txt, 2.txt, and 3.txt then << rm * >> expands to "rm 1.txt 2.txt 3.txt" and is then executed.

if you have -@, 1.txt, 2.txt, and 3.txt, that expands to << rm -@ 1.txt 2.txt 3.txt >>, and that can't execute.

(if you really wanted to remove your -@ you'd do << rm -- * >> because a double-dash signals the end of command-line options.)

ulrikrasmussen · on March 24, 2015

The * is expanded by the shell to a space-delimited list of filenames, but the shell does not adequately escape filenames that can be misinterpreted as arguments to 'rm'.

gmac · on March 24, 2015

That's kind of scary — in that case I guess I should avoid creating a file named -rf.

coldpie · on March 24, 2015

Yes. There was an article linked from HN ages ago (at least a year) that went into mitigation techniques for these issues. As you expect, it basically became fractal, and even then still had bugs. I wish I still had the URL.

un1xl0ser · on March 24, 2015

I think it may be more scary for code that allows arbitrary execution using command-line arguments. Commands like find or xargs using without defense against this would be a problem. For example, site that does something precious with your uploaded pet pictures.

Defending against this being the use of -- to signal an end of command line arguments.

wodenokoto · on March 24, 2015

That is really interesting.

Can someone knowledgeable about the shell expand on this? I don't dare test it on my machine.

vbezhenar · on March 24, 2015

Shell session to demonstrate (DO NOT DO IT IN THE DIRECTORY WITH IMPORTANT FILES):

  $ touch important
  $ chmod 400 important
  $ rm *
  override r--------  vbezhenar/staff for important? n
  $ touch -- -rf
  $ ls -l
  total 0
  -rw-r--r--  1 vbezhenar  staff  0 Mar 24 14:35 -rf
  -r--------  1 vbezhenar  staff  0 Mar 24 14:35 important
  $ rm *
  $ ls -l
  total 0
  -rw-r--r--  1 vbezhenar  staff  0 Mar 24 14:35 -rf
  $ rm -- -rf

theothertom · on March 24, 2015

I've not tested it, but it should expand just like anything else. The effect would broadly be that running "rm *" in the directory would recurse into subfolders without warning.

pbhjpbhj · on March 24, 2015

Re testing. If you follow this link - https://www.digitalocean.com/?refcode=3fc9a5a35c52 - you get $10 free credit (affiliate link I get $25 if you spend that much in future) on DigitalOcean.

You can spin up a droplet and use the online shell tool or ssh in (very easy when you've set up a cert as the droplet can have the cert setup automatically).

Then you can mess about with a droplet as much as you like, virtually speaking. Once you're done then use the control panel to destroy the droplet - it costs a few ¢ a day and if you don't have a droplet in use (which means active or paused; preserving images is cheaper but non-zero) then you don't pay anything.

Basically sign up and have a year of uptime to mess with a full install of various OS with no charge.

Make sure you don't write "rm -rf /*" in the wrong terminal!

bentcorner · on March 24, 2015

> shell does not adequately escape filenames that can be misinterpreted as arguments to 'rm'

That sounds like a bug to me, or at least depending on suboptimal behavior.

raverbashing · on March 24, 2015

More specifically, it is expanded on the shell

Example:

$ echo /*

/bin /boot /dev /etc /home (...)

atmosx · on March 24, 2015

Hm that's a cool trick. IIRC some distributions (Suse?) had a 'bash' clause where you couldn't do "rm -rf /" without 'y'.

My 'zsh' has this one too, when I 'rm -rf /some/dir/' always asks if I'm "sure". Truth to be told, I'm not even expecting the text in "stdout" anymore, my finger goes to the 'y' automatically, which means that if I make something stupid it won't be able to protect me :-P

The last couple of years I stopped doing 'stupid things' by stop working on the shell when I'm very tired. That was the cause of my rm-related-incidents in the past :-)

iancarroll · on March 24, 2015

Some versions of RM (Ubuntu/Debian if I remember correctly) require --no-preserve-root.

amadvance · on March 24, 2015

Likely you mean: touch ~/-@

miles · on March 24, 2015

Neat idea! Just tested that in an OS X 10.8 virtual machine; while it works nicely against "rm -rf *", sadly it does not help stop an accidental sudo rm -rf / or ~/. Also, "touch ~-@" created a file in the home directory called "~-@"; in order to set the correct filename, I cd'd into ~ and then ran "touch ./-@".

liviu- · on March 24, 2015

  touch /--
  rm -rf *

(admittedly, this would be a malicious attempt rather than a careless bug)

ioquatix · on March 24, 2015

Or just use zsh.

michaelx386 · on March 24, 2015

Would zsh still protect me if the script explicitly uses Bash (i.e. #!/bin/bash)? Sorry if this is a dumb question, I'm unsure how shells work when calling other kinds of shell scripts.

phireal · on March 24, 2015

No. If the script has the #! set as /bin/bash, then it will run as bash.

anh79 · on March 24, 2015

zsh is an interactive shell. It is not to replace #bash as system shell as I know.

jedisct1 · on March 24, 2015

Zsh doesn't have to be used interactively.

It's also excellent for scripting, and has far more features than bash.

quoiquoi · on March 28, 2015

alias rm='trash-put'

Aissen · on March 24, 2015

rm -rf -- $EMPTY_VAR/*

Ooops.

elchief · on March 24, 2015

We had one of these kinda bugs.

If you uninstalled our software it deleted a major chunk of your Windows registry, crippling your computer. It was a one character error in our script. The first ticket read "Uninstalling [Product] destroys your computer".

I was responsible for customer support. Good times! Was a rough week. We managed to not get sued.

VieElm · on March 24, 2015

Pretty terrible that such a thing is even allowed by Windows. This is why as a user I like Apple's OS X sandbox.

antientropic · on March 24, 2015

Similar things have happened on OS X: http://apple.slashdot.org/story/01/11/04/0412209/itunes-20-i...

coldtea · on March 24, 2015

Parent said "This is why as a user I like Apple's OS X sandbox.".

This is a bug from 15 years ago, much much before the sandbox feature was introduced.

A sandboxed iTunes would have prevented that.

makomk · on March 24, 2015

A sandboxed iTunes would also prevent syncing your iPod and importing existing music collections, because those both require access to files outside the sandbox, which is probably why Apple hasn't done that.

aroch · on March 24, 2015

Programs can read files outside their sandbox if they're asked to by user input[1]. Sandboxing does not prevent interaction with USB devices

[1] See "Powerbox and File System Access Outside of Your Container" at https://developer.apple.com/library/mac/documentation/Securi...

TazeTSchnitzel · on March 24, 2015

That doesn't mean iTunes can't work in a sandbox, it'd just need to request specific permissions.

coldtea · on March 25, 2015

A sandbox isn't a totally isolated prison. It's a permissions system. Programs can read specific files and folders outside the sandbox and can even ask the user add new files/folders to their whitelist.

sz4kerto · on March 24, 2015

If a software needs/has root rights, then all bets are off. This is true on any OS.

Navarr · on March 24, 2015

Windows 8 introduced a sandbox and users hated it. (Probably because it also forced digital signatures but hey, specifics)

jakobegger · on March 24, 2015

About a year ago or so I tried to fix a computer of an OS X user where the Dropbox installer somehow deleted the home directory and replaced it with the dropbox application... That was an odd experience (and one where sandbox didn't help...)

ffn · on March 24, 2015

Thanks OP, this actually made me laugh uproariously. Anyway, I'd be willing to bet 100 push-ups that (unless it was malicious and not a bug), this thing is caused by some clean up code somewhere that originally intended to do "rm -rf /path/to/squid/socket" but the function that was suppose to generate the "/path/to/squid/socket" string instead generated a null which was then parseString'd onto a "" via some + function that was trying to do "/" + null.

But I'm neither a redhat user nor an OS dev, so I might completely wrong.

profmonocle · on March 24, 2015

That's almost exactly how I blew up a test server once. (rsync --delete in place of rm) Taught me to be extremely careful when dealing with absolute directory paths.

marcosdumay · on March 24, 2015

Oh man... Rsync deserves extreme caution even compared with other bash commands. It's so easy to erase everything and copy terabytes again.

thaumaturgy · on March 24, 2015

Hmm, there are two possible candidates in the init.d script for the RHEL 6 package of an older version of squid (doesn't look like bug submitter is using a current version).

In stop():

    rm -rf $SQUID_PIDFILE_DIR/*

and in restart():

    rm -rf $SQUID_PIDFILE_DIR/*

SQUID_PIDFILE_DIR is hardcoded to "/var/run/squid" at the top of my copy of the init script. But, neither of those rm commands check first to make sure that SQUID_PIDFILE_DIR isn't empty (or, better yet, is in /var and doesn't contain ".."), and either the submitter's copy of the script is mangled or something else somewhere is stomping on SQUID_PIDFILE_DIR in the shell environment.

...I should grep my init scripts for "rm".

rachelbythebay · on March 24, 2015

squid-3.1.10-29.el6.src.rpm (from ftp.redhat.com, buried where they keep SRPMs) has squid.init within, and that file has no mention of SQUID_PIDFILE_DIR. A few other spot-checked versions are the same way.

https://github.com/mozilla-services/squid-rpm/blob/master/SO... ... however ... whatever that is, does.

Did they take this upstream init script somehow?

ato · on March 24, 2015

This looks like it:

https://bugzilla.redhat.com/show_bug.cgi?id=1102343

I guess they applied that change which was obviously written against a very different init script where the variable is actually defined, got QA to test it and immediately backed it out.

thaumaturgy · on March 24, 2015

Oh, good call. Yeah, you found it.

thaumaturgy · on March 24, 2015

That would be my guess too, although the package maintainer confirmed it on (presumably) a fresh install. I can't find a candidate rm command anywhere in the SRPM, so maybe an upstream file got merged into distrib somehow? I don't have access to an RHEL system to try it out, and can't find the distrib RPM yet to check.

zx2c4 · on March 24, 2015

    "Thanks Swapna and Red Hat QE for catching this issue before the package was released. Great work!"

Looks like this wasn't released into production.

stefanha · on March 24, 2015

This comment should be at the top. False alarm, squid users, and now back to our regularly scheduled HTTP proxying.

For more details on why this issue doesn't affect squid users: https://rwmj.wordpress.com/2015/03/24/restarting-squid-resul...

akandiah · on March 24, 2015

It looks like the start-up/shutdown script is doing an rm -rf on a bash variable that's evaluating to null.

rosser · on March 24, 2015

If that is, in fact, the case, it is indeed a repeat of the Steam bug that 'kavan mentions elsethread.

TheDong · on March 24, 2015

Citation needed.

It very well could have been an instance of the bumblebee bug (https://github.com/MrMEEE/bumblebee-Old-and-abbandoned/issue...) where it's "rm -f / some/file" instead of "rm -f /some/file"

Link to the code if you want to make that claim.

smrtinsert · on March 24, 2015

It's hard to trust anyone that would ever depend on such a variable without having some sort of precondition.

TheDong · on March 24, 2015

"it's hard to trust anyone that writes a dumb bug".

Listen to yourself. We've all written dumb bugs. We've all had that one line of code that was an obvious mistake. I still trust people who write a bug here and there because if I didn't I would have to forgo trusting everyone for everyone makes dumb mistakes sometimes.

lmm · on March 24, 2015

At this point I refuse to trust anyone who writes shell scripts. Bugs happen, but a shell script is practically guaranteed to have zero automated tests and variable-related bugs are all too common.

lttlrck · on March 24, 2015

Surely your distrust should be aimed at shell scripts not at everyone who has written one.

smrtinsert · on March 24, 2015

There has to be some standard. While I didn't use the word dumb, it is clearly careless.

spacemanmatt · on March 24, 2015

Aren't you glad for QA? Humans are inconsistent.

int19h · on March 24, 2015

10 bucks says the lesson learned will be "remember kids, always set -u (and other good ideas, like set -e, set -o pipefail, and personally I like set -o posix but you'll need to give up process substitution for that)."

mplewis · on March 24, 2015

I'm having trouble finding the related commit that fixed this, can anyone else find it?

rwmj · on March 24, 2015

This bug is an internal QA bug. The reporter is a Red Hat tester. The buggy RPM was never released outside Red Hat, and so there's no requirement to release the source. When RHEL 6.7 comes out the source will go up on ftp.redhat.com.

anh79 · on March 24, 2015

Maybe I would add a new record to this https://github.com/icy/bash-coding-style#good-lessons

chanux · on March 24, 2015

..aaand this bug happened in 2014 as well?

anh79 · on March 24, 2015

no, it's my mistake. I have fixed that.

fermigier · on March 24, 2015

Scary. Even more scary is the fact that the bug has been open for a week, one person has confirmed 100% reproducibility, and no one seems to care at Red Hat. Isn't deleting peoples hard drives a big no no ?

TheDong · on March 24, 2015

It already has a "fixed in version" listed. Fixing it within a day doesn't sound like no one caring :)

Does it really matter if no one commented "Oops, we screwed up"? It's kinda self-evident that there was a mistake and there's not really much to say; it's clear from the description how bad it is and marking it "Fixed in version" already says it all pretty much.

EdwardDiego · on March 24, 2015

> Does it really matter if no one commented "Oops, we screwed up"?

To me it does. You just deleted someone's hard drive, an apology wouldn't be out of order.

TheDong · on March 24, 2015

When you're a QA engineer at RHEL working on an UNRELEASED PRODUCT then no, I don't think you need an apology. Maybe a thanks for finding the bug, but this is the whole point of QA and the whole point of QA-ing before it's released.

The context matters a lot. This title is linkbait. It omits mentioning that this was not publicly released and the reporter is a QA for Red Hat. Given that context I doubt you'll still agree an apology is so necessary or that this was not handled well.

vacri · on March 24, 2015

This being said, the bugtracker isn't very clear when it comes to ticket status. Currently the status is 'on_qa'. I guess if you use it a lot you're aware of it, but it strikes me as iffy UI.

blcknight · on March 24, 2015

It's exceptionally clear for anyone even remotely familiar with RHEL or Bugzilla. It's not mean for any joe's consumption anyway, this is a work tool used by developers and QE...

vacri · on March 24, 2015

'exceptionally clear'? How do you detect that bug status from across the room? If that bug page was up on a big screen, how do you detect it's status? How do you tell at a glance from a meter away what the status is? All of those things should be doable for an 'exceptionally clear' status. Developers don't have mysteriously different visual capabilities from 'joes'.

this is a work tool used by developers and QE

Because as we all know, developers don't benefit from good UIs. One wonders why they would use a webpage at all, rather than simply connect to an ncurses-based bugtracker that only supports an 80x24 terminal.

res0nat0r · on March 24, 2015

> How do you detect that bug status from across the room?

Know and understand the tools you are working with, and simply click the Modified (History) link at the top to determine the bug timeline: https://bugzilla.redhat.com/show_activity.cgi?id=1202858

vacri · on March 24, 2015

I don't understand why I'm getting pounded by downvoters. First someone tells me that a small-typeface 'status' is 'exceptionally clear' and UI doesn't really matter for developers. Next I'm told that in order to detect something from across the room, I have to click through a link.

I realise my comment was snarky, but both of these responses were in a patronising tone themselves. Does bugzilla have some sort of rabid fanbase akin to the vim/emacs wars?

RubyPinch · on March 24, 2015

its like, the very first line of the bug's informational body. And it shows up in bugzilla searches too: https://bugzilla.redhat.com/buglist.cgi?quicksearch=deleting...

I don't think anyone else has had a tone other than short-spoken

vacri · on March 25, 2015

Keep in mind that I did not say anything like 'terrible' or 'worst ever'. I said 'iffy'. I also obviously know where to find the state, because I mentioned it in my original comment.

In return I got that it was 'exceptionally clear' (which it clearly isn't, given there's a few people in this thread that missed it); that UI doesn't matter for developers; a possible insinuation that I don't know the kind of tools devs use; and a follow-up comment that tells me I need to know my tools but then proceeds to completely ignore the use-case it explicitly quoted when telling me what I should do. I don't know how that second one can be seen as anything but patronising.

Continuing on with the theme, your point still doesn't answer any of the issues I had in my downmodded comment regarding what is 'exceptionally clear'. Can you tell at a glance from a meter away? My point isn't about the existence of a field, but its presentation.

If you don't think the UI is iffy, that's fine, we disagree. But let's not make up nonsense about things like devs not benefiting from good UI or offering solutions that don't match the quoted use-case.

RubyPinch · on March 25, 2015

> (which it clearly isn't, given there's a few people in this thread that missed it)

I feel this is more due to many people wanting to look at the bug itself, instead of its metadata (due to the title of the link)

> Can you tell at a glance from a meter away?

I seriously can't tell anything that isn't colored, I do not have any idea what is the concept of distanced vision, (short sighted, glasses still make things fuzzy)

I would wonder if the status in all bugzilla implementations would warrant viewable-at-a-meter, I personally find the presentation fine though: it is the first thing I see other than the title, unless I'm specifically not looking for it.

res0nat0r · on March 25, 2015

You are being down voted because you are complaining about something which isn't relevant. Complaining about bugzillas UI not being pretty and clear from across the room really isn't relevant to the issue or bugzillas purpose whatsoever.

JustSomeNobody · on March 24, 2015

If anyone is using an unreleased product on anything other than a test environment with anything other than test data, then the nicest thing anyone could do for them at this point is to simply point out the bug has been fixed.

cynix · on March 24, 2015

RedHat is pleased to announce the general availability of our new RedHat Enterprise Data Recovery service. Please contact your account manager for details.

rwmj · on March 24, 2015

It's already fixed, plus there are many private comments which you cannot see (but I think you see "missing" comment numbers). Also the product in question is not released yet. It's good this was found, but no customer would have been affected unless they were using an alpha.

oogway · on March 24, 2015

Well, the bug was reported against RHEL 6.7 which has not been released yet.

cssmoo · on March 24, 2015

RH don't have a great reputation here. Unlike Debian which does proper triage and practices "zero release-critical bugs", RH threw out RHEL7 with loads of critical issues still open.

rwmj · on March 24, 2015

This is simply not true. Could you provide evidence rather than making stuff up.

cssmoo · on March 24, 2015

Ok.

Fresh steaming proof as requested:

https://bugzilla.redhat.com/buglist.cgi?bug_severity=urgent&...

All high severity bugs against 7.1 which was relased 16 days ago. Check the dates on half of them. They're before the release date and half of them haven't even been assigned or triaged.

When 7.0 came out, datetimectl and systemd didn't even work properly. Enabling ntp threw dbus errors galore. On some kit it didn't even boot. Total lemon.

RHEL doesn't generally work properly until the .2 releases. I've been using it for 10 years so I've got plenty of experience on the matter.

I would go into detail about the CIFS/smb kernel hangs I've had on 6.x but I've had enough of it by now.

rwmj · on March 24, 2015

The priority fields are set by developers so they know which bugs they should work on first. The two bugs of mine which appear on that list are both new features for RHEL 7.2. I set the priority of those so I know to work on them first. I really think you need a better query than that one.

Update: I think if you wanted to find out which critical bugs affected RHEL 7.0 on release, you'd probably want to look at the list of z-stream packages (RHEL 7.0.z) which subscribers have access to. These are bugs which didn't affect the installer or first boot, but were important enough to need fixing in RHEL 7.0 after it went out. (If a bug was critical enough to affect installation or first boot, it would have delayed the release).

jasonlotito · on March 24, 2015

First "high severity bug against 7.1" on the list:

"Customers would like to be able to use their IdM users to log on to Window clients that a part of the trusted domain."

"Doc Type: Enhancement"

Couple that with what rwmj said, you've effectively debunked yourself.

vidarh · on March 24, 2015

Redhat Linux (not RHEL) tended to have issues until the ".2" releases too, all the way back to 4.x in '96/'97.

jamespo · on March 24, 2015

Ah yes, like good old DSA-1571-1

cssmoo · on March 24, 2015

I concede that one. Have an upvote.

raldi · on March 24, 2015

And this is why every bug tracking system should have a triage SLA.

Or, even if you don't want to declare a threshold, publish the current stats on its front page: "Over the last 30 days, our 99-percentile triage wait time was: XX hours."

Similarly, open tickets with priority=urgent should never go 24 hours without a new comment from the owner.

Animats · on March 24, 2015

"Or, even if you don't want to declare a threshold, publish the current stats on its front page: "Over the last 30 days, our 99-percentile triage wait time was: XX hours."

Now there's a good idea. Major open source projects should have software quality dashboards tracking things like that.

We're now seeing hospital emergency rooms displaying their current wait time in minutes on billboards.

cellularmitosis · on March 24, 2015

Ahh Redhat, the distro which chose to symlink a bunch of binary lib files from Apache into the etc directory.

"Why is grepping /etc taking so long? Binary files in /etc?!? WTF?!?"

Coming from Debian, Redhat seems to make a lot of irk-worthy choices.

Erwin · on March 24, 2015

I find the /etc/httpd/logs symlink more annoying. If you want to grep through your Apache configuration you have to explicitly grep through conf and conf.d otherwise just going to /etc/httpd and doing a grep -r you're searching through gigs of Apache logs.

Tiksi · on March 24, 2015

grep -r shouldn't follow symlinks, -R does however:

      -d, --directories=ACTION  how to handle directories;
                            ACTION is 'read', 'recurse', or 'skip'
      -D, --devices=ACTION      how to handle devices, FIFOs and sockets;
                            ACTION is 'read' or 'skip'
      -r, --recursive           like --directories=recurse
      -R, --dereference-recursive  likewise, but follow all symlinks

feld · on March 25, 2015

requires someone to know there will be symlinks they don't want to follow, though :-)

lmm · on March 24, 2015

As opposed to Debian, the distro which chooses to break tomcat (a program which unzips into a single folder and is thereby completely self-contained) up into a million different pieces and scatter them randomly all over your hard drive?

ownagefool · on March 24, 2015

To be fair to Debian, that's how it's supposed to work. You untar into /opt, self contained, while your apt install puts things in the Filesystem Hierarchy Standard, which means config goes into /etc.

You'll find most distros follow some FHS standard, although there's some differences in interpretation.

marcosdumay · on March 24, 2015

Yes, it's a distro where you can find EVERY setting on /etc, even if the software creator decided you should know to look somewhere else.

(That said, making the package self contained is the most sensible way for the developer to release it. It's just not a good option for a distro package.)

Padding · on March 24, 2015

Is there an aricle with specifics on this and why it was done?

teddyh · on March 24, 2015

I have no direct information about this specific case, but in ye olden Unix™ days, there was no /sbin, so all those binaries instead lived in /etc. The Red Hat symlinks could be a backwards-compatibility thing.

SSLy · on March 24, 2015

You mean bug-compability?

teddyh · on March 24, 2015

No, I would not mean that, since the previous behavior was not a bug.