Though QA is Quality Assurance when really it's QC )Quality Control) that should have caught the error. QA puts processes into place that means there is a QC that can catch this.
Sorry, going off-topic but as a tester I dislike being told to 'QA this'
I would bet on the issue being in the init script itself rather than squid. (I'm assuming squid doesn't run as root by default in rhel) If that's true then it's another point for more sane process managers (upstart/supervisord/systemd/...)
set -o pipefail makes common idioms a pain. Consider using head, which simply exits after it has read a few lines. In this case, the input process gets a SIGPIPE and exits with a non-zero exit code:
Consider /tmp/test.sh:
set -o pipefail
yes foo | head
$ bash /tmp/test.sh >/dev/null
$ echo $?
141
From the same page:"rking's personal recommendation is to go ahead and use set -e, but beware of possible gotchas. It has useful semantics, so to exclude it from the toolbox is to give into FUD."
You can use set -e, and turn it off (set +e) for code blocks and things that are problematic. He could also add '|| true', and you may be able to use colon to avoid point problems without turning everything off. These are edge cases and you can easily work around them if you an advanced user.
If you are not an advanced user then you should certainly use -e.
or check the variable before using it, like any other programming language:
[[ "$VAR" ]] && rm -rf "$VAR/*"
I think most of these issues stem from the fact that most developers that write shell scripts don't actually understand what they're doing, treating the script as a necessary annoyance rather than a component of the software.
If anyone understands shell scripts, it would be people writing init scripts at Red Hat :)
Anyways, that is not anything like other programming languages. Checking in that way is error prone and not really an improvement (nor equivalent to set -o).
[[ "$DAEMON_PATH" ]] && rm -rf "$DEAMON_PATH/*"
See what I did there? It's an rm -rf /* bug because "checking variables" is not the answer.
In other programming languages, if an identifier is mis-typed things will blow up. E.g., in ruby if I write:
daemon_path=1; if daemon_path; puts deamon_path; end
I get "NameError: undefined local variable or method `deamon_path`"
These issues do not always stem from bad developers. Bash's defaults are not safe in many ways and saying "people should just check the variable" isn't helpful here.
Shameless plug for my language "bish" (compiles to bash) which aims to solve many of these annoyances with shell scripting: https://github.com/tdenniston/bish
Bash has the ability to also flag use of an undefined variable an error, it is just not on by default.
set -u
Man page quote: "Treat unset variables and parameters other than the special parameters "@" and "*" as an error when performing parameter expansion. If expansion is attempted on an unset variable or parameter, the shell prints an error message, and, if not interactive, exits with a non-zero status."
Yeah, everyone always loves to shit on BAT (which is fair, it is terrible) and VBS (which is slightly less fair) but inspite of how many problems Bash has (least of all the massive security issue last year), it gets off almost scot free.
These bugs are indicative of Bash's design problems. Why is it used for init scripts? And don't even get me started on how Bash interprets filenames as part of the arguments list when using * (e.g. file named "-rf").
Say what you will about Powershell, but having a typed language that can throw a null exception is useful for bugs like these. The filename isn't relevant, and a null name on a delete won't try to clear out of the OS (just throw).
Not just scot free - during the Great systemd War of 2014 is was a talking point for the antis that using anything other than the pure, reliable simplicity of shell for service management was MADNESS!
I don't think that was the argument, as much as it was that if a shell script fouled up it was easier to get in and do field repairs because it was interpreted rather than compiled.
It could be the default for non-interactive shells without causing this problem. Or we could have a more nuanced rule, where -e means "stop executing the current sequence of commands as soon as there is an error", where a "sequence of commands" is a single line in an interactive shell (so "false; whoami" would print nothing), or the entire file in a script.
The real answer is that this has not been the default in the time between shells being invented and this comment being posted, and so the squillions of lines of shell script out there in the wild keeping the world turning have not been written with this in mind. Making it the default now would break a lot of things.
With the benefit of hindsight, though, i would say that yes, this should have been the default in scripts. Oh well.
That's not completely true. At least with the GNU tools, 'rm' won't delete the root directory unless you specifically give it the '--no-preserve-root' flag. Since that flag has no use outside of deleting root, it's unlikely it has the flag on it. With that in mind the script must do some type of manual deleting for some reason.
I believe that "--preserve-root" applies only to / itself. That means `rm -rf /*` will expand to `rm -rf /bin /dev /etc /lib ...` and delete all anyway.
That's accurate. `rm -rf /* ` will still work to delete everything. But that said, `rm -rf "$STREAMROOT/"` can't ever expand to that, and more-over since the expansions in double-quotes it won't be subject to path expansion by bash. So even "/* /", which would normally expand into "/bin/ /dev/ /etc/ ..." won't. You can see what I mean yourself, just use echo:
This just happened to my coworker today. I'm sitting behind him telling him which commands to type (he's new to Linux...) when suddenly he jumps the gun and pushes enter just as I say "slash". My heart nearly stopped. I didn't even know preserve-root existed (plus I always iterate not to log in as root). It was a snapshotted vm but we still would have lost the day's work.
I feel like it would be a frighteningly common bug. I remember one like this from 2011 [1]. Install/packaging/utility scripts usually do not get as much attention and testing as the application code itself.
I'd say the fact that these bugs only very occasionally happen - relative to the huge number of shell scripts out there that are being executed every day - that it's not really "frighteningly common". You only hear about the ones that fail.
By the same logic, memory safety issues only happen rarely, right? Most programs/scripts are going to be tested if part of a distribution and such errors removed. But without polling people it'd be hard to know of the many times this kinda thing messed things up. I personally wiped out a production DB due to expanding an unset variable (fortunately immediately after taking a backup.
This is, as the bug notes, a regression, and I'm guessing you're right about it being in the initscript (I'm pretty sure). I used to be a very heavy Squid user and Squid developer and I remember a very similar bug many years ago. It was in the cache_dir initialization code. It would read the configuration file, parse out the cache_dir lines, and if the directories didn't exist it would create them as part of the startup.
There were some circumstances where if there was no cache_dir line configured, or if the cache_dir was a link or something, the details are very sketchy in my mind after so much time, but it would end up destroying /.
No, but if an analogous bug happened (systemd forgot to set an internal squidroot variable before clearing the squidroot, for instance), it would be much, much harder to figure out what was going on. Which is really what everybody's complaints boil down to.
"systemd" and "sane" only ever go in the same sentence as "sane people don't use systemd".
It looks like a bug in the init script; runnign it as squid's user wouldn't have triggered destroying the whole filesystem; likely just squid's config and anything under its /var.
I'll be the first to call out systemd for a lot of things, but not its core init idea. It's the same as daemontools, upstart, supervisord, and others do. Implementation is very different of course, but the idea is common - you run/kill services, not start/stop them. That's the reason we can leave the ugly and error-prone init scripts behind.
Which is what happens when you have every daemon writing their own PID handling code, running as root, in a language whose interpolation rules nobody really understands.
While this is a really nice hack, stuff like this is also the reason I feel really uneasy when writing shell scripts. What works now may suddenly break in the future due to inadequate escaping of filenames.
No, it doesn't work. "./*" expands to "./ -@" as a single field, which rm has no problems with. (Note that, however, this is still the globbing of the shell, as far as I understand.)
how would changing the filename fix it? it's a hack relying on the globbing of the shell. if you're not using the globbing, the hack can never help you.
Very handy trick. More so if no root access, myself I preer to rename rm and drop a shell wrapper in place and can be simply a case of changing all passed "" into "Sorry_Dave_I_can_not_allow_you_to_do_that" (or words to that effect) and all "-AsteRISKdeleteALL" into "-" and then that modified input is passed onto rm. But can adjust how and add rules to taste.
That way the pain of having to type AsteRISKdeleteALL instead of * for rm events offsets any anxiety by far.
You can also catch the rm and mv to a difectory with quota's you can call a recycle bin, some low end attached storage can be fine as well as not many situations when your wildcard deleting with a time factor. Can accommodate this in your own skulker to clean up in a more organised way overall in a timely manner. and and scripts you can path to the real rm command if needs be, last time I called it P45Generator, but not the finest for readability in any such scripts.
The * is expanded by the shell to a space-delimited list of filenames, but the shell does not adequately escape filenames that can be misinterpreted as arguments to 'rm'.
Yes. There was an article linked from HN ages ago (at least a year) that went into mitigation techniques for these issues. As you expect, it basically became fractal, and even then still had bugs. I wish I still had the URL.
I think it may be more scary for code that allows arbitrary execution using command-line arguments. Commands like find or xargs using without defense against this would be a problem. For example, site that does something precious with your uploaded pet pictures.
Defending against this being the use of -- to signal an end of command line arguments.
I've not tested it, but it should expand just like anything else. The effect would broadly be that running "rm *" in the directory would recurse into subfolders without warning.
You can spin up a droplet and use the online shell tool or ssh in (very easy when you've set up a cert as the droplet can have the cert setup automatically).
Then you can mess about with a droplet as much as you like, virtually speaking. Once you're done then use the control panel to destroy the droplet - it costs a few ¢ a day and if you don't have a droplet in use (which means active or paused; preserving images is cheaper but non-zero) then you don't pay anything.
Basically sign up and have a year of uptime to mess with a full install of various OS with no charge.
Make sure you don't write "rm -rf /*" in the wrong terminal!
Hm that's a cool trick. IIRC some distributions (Suse?) had a 'bash' clause where you couldn't do "rm -rf /" without 'y'.
My 'zsh' has this one too, when I 'rm -rf /some/dir/' always asks if I'm "sure". Truth to be told, I'm not even expecting the text in "stdout" anymore, my finger goes to the 'y' automatically, which means that if I make something stupid it won't be able to protect me :-P
The last couple of years I stopped doing 'stupid things' by stop working on the shell when I'm very tired. That was the cause of my rm-related-incidents in the past :-)
Neat idea! Just tested that in an OS X 10.8 virtual machine; while it works nicely against "rm -rf *", sadly it does not help stop an accidental sudo rm -rf / or ~/. Also, "touch ~-@" created a file in the home directory called "~-@"; in order to set the correct filename, I cd'd into ~ and then ran "touch ./-@".
Would zsh still protect me if the script explicitly uses Bash (i.e. #!/bin/bash)? Sorry if this is a dumb question, I'm unsure how shells work when calling other kinds of shell scripts.
If you uninstalled our software it deleted a major chunk of your Windows registry, crippling your computer. It was a one character error in our script. The first ticket read "Uninstalling [Product] destroys your computer".
I was responsible for customer support. Good times! Was a rough week. We managed to not get sued.
A sandboxed iTunes would also prevent syncing your iPod and importing existing music collections, because those both require access to files outside the sandbox, which is probably why Apple hasn't done that.
A sandbox isn't a totally isolated prison. It's a permissions system. Programs can read specific files and folders outside the sandbox and can even ask the user add new files/folders to their whitelist.
About a year ago or so I tried to fix a computer of an OS X user where the Dropbox installer somehow deleted the home directory and replaced it with the dropbox application... That was an odd experience (and one where sandbox didn't help...)
Thanks OP, this actually made me laugh uproariously. Anyway, I'd be willing to bet 100 push-ups that (unless it was malicious and not a bug), this thing is caused by some clean up code somewhere that originally intended to do "rm -rf /path/to/squid/socket" but the function that was suppose to generate the "/path/to/squid/socket" string instead generated a null which was then parseString'd onto a "" via some + function that was trying to do "/" + null.
But I'm neither a redhat user nor an OS dev, so I might completely wrong.
That's almost exactly how I blew up a test server once. (rsync --delete in place of rm) Taught me to be extremely careful when dealing with absolute directory paths.
Hmm, there are two possible candidates in the init.d script for the RHEL 6 package of an older version of squid (doesn't look like bug submitter is using a current version).
In stop():
rm -rf $SQUID_PIDFILE_DIR/*
and in restart():
rm -rf $SQUID_PIDFILE_DIR/*
SQUID_PIDFILE_DIR is hardcoded to "/var/run/squid" at the top of my copy of the init script. But, neither of those rm commands check first to make sure that SQUID_PIDFILE_DIR isn't empty (or, better yet, is in /var and doesn't contain ".."), and either the submitter's copy of the script is mangled or something else somewhere is stomping on SQUID_PIDFILE_DIR in the shell environment.
squid-3.1.10-29.el6.src.rpm (from ftp.redhat.com, buried where they keep SRPMs) has squid.init within, and that file has no mention of SQUID_PIDFILE_DIR. A few other spot-checked versions are the same way.
I guess they applied that change which was obviously written against a very different init script where the variable is actually defined, got QA to test it and immediately backed it out.
That would be my guess too, although the package maintainer confirmed it on (presumably) a fresh install. I can't find a candidate rm command anywhere in the SRPM, so maybe an upstream file got merged into distrib somehow? I don't have access to an RHEL system to try it out, and can't find the distrib RPM yet to check.
"it's hard to trust anyone that writes a dumb bug".
Listen to yourself. We've all written dumb bugs. We've all had that one line of code that was an obvious mistake. I still trust people who write a bug here and there because if I didn't I would have to forgo trusting everyone for everyone makes dumb mistakes sometimes.
At this point I refuse to trust anyone who writes shell scripts. Bugs happen, but a shell script is practically guaranteed to have zero automated tests and variable-related bugs are all too common.
10 bucks says the lesson learned will be "remember kids, always set -u (and other good ideas, like set -e, set -o pipefail, and personally I like set -o posix but you'll need to give up process substitution for that)."
This bug is an internal QA bug. The reporter is a Red Hat tester. The buggy RPM was never released outside Red Hat, and so there's no requirement to release the source. When RHEL 6.7 comes out the source will go up on ftp.redhat.com.
Scary. Even more scary is the fact that the bug has been open for a week, one person has confirmed 100% reproducibility, and no one seems to care at Red Hat. Isn't deleting peoples hard drives a big no no ?
It already has a "fixed in version" listed. Fixing it within a day doesn't sound like no one caring :)
Does it really matter if no one commented "Oops, we screwed up"? It's kinda self-evident that there was a mistake and there's not really much to say; it's clear from the description how bad it is and marking it "Fixed in version" already says it all pretty much.
When you're a QA engineer at RHEL working on an UNRELEASED PRODUCT then no, I don't think you need an apology. Maybe a thanks for finding the bug, but this is the whole point of QA and the whole point of QA-ing before it's released.
The context matters a lot. This title is linkbait. It omits mentioning that this was not publicly released and the reporter is a QA for Red Hat.
Given that context I doubt you'll still agree an apology is so necessary or that this was not handled well.
This being said, the bugtracker isn't very clear when it comes to ticket status. Currently the status is 'on_qa'. I guess if you use it a lot you're aware of it, but it strikes me as iffy UI.
It's exceptionally clear for anyone even remotely familiar with RHEL or Bugzilla. It's not mean for any joe's consumption anyway, this is a work tool used by developers and QE...
'exceptionally clear'? How do you detect that bug status from across the room? If that bug page was up on a big screen, how do you detect it's status? How do you tell at a glance from a meter away what the status is? All of those things should be doable for an 'exceptionally clear' status. Developers don't have mysteriously different visual capabilities from 'joes'.
this is a work tool used by developers and QE
Because as we all know, developers don't benefit from good UIs. One wonders why they would use a webpage at all, rather than simply connect to an ncurses-based bugtracker that only supports an 80x24 terminal.
I don't understand why I'm getting pounded by downvoters. First someone tells me that a small-typeface 'status' is 'exceptionally clear' and UI doesn't really matter for developers. Next I'm told that in order to detect something from across the room, I have to click through a link.
I realise my comment was snarky, but both of these responses were in a patronising tone themselves. Does bugzilla have some sort of rabid fanbase akin to the vim/emacs wars?
Keep in mind that I did not say anything like 'terrible' or 'worst ever'. I said 'iffy'. I also obviously know where to find the state, because I mentioned it in my original comment.
In return I got that it was 'exceptionally clear' (which it clearly isn't, given there's a few people in this thread that missed it); that UI doesn't matter for developers; a possible insinuation that I don't know the kind of tools devs use; and a follow-up comment that tells me I need to know my tools but then proceeds to completely ignore the use-case it explicitly quoted when telling me what I should do. I don't know how that second one can be seen as anything but patronising.
Continuing on with the theme, your point still doesn't answer any of the issues I had in my downmodded comment regarding what is 'exceptionally clear'. Can you tell at a glance from a meter away? My point isn't about the existence of a field, but its presentation.
If you don't think the UI is iffy, that's fine, we disagree. But let's not make up nonsense about things like devs not benefiting from good UI or offering solutions that don't match the quoted use-case.
> (which it clearly isn't, given there's a few people in this thread that missed it)
I feel this is more due to many people wanting to look at the bug itself, instead of its metadata (due to the title of the link)
> Can you tell at a glance from a meter away?
I seriously can't tell anything that isn't colored, I do not have any idea what is the concept of distanced vision, (short sighted, glasses still make things fuzzy)
I would wonder if the status in all bugzilla implementations would warrant viewable-at-a-meter, I personally find the presentation fine though: it is the first thing I see other than the title, unless I'm specifically not looking for it.
You are being down voted because you are complaining about something which isn't relevant. Complaining about bugzillas UI not being pretty and clear from across the room really isn't relevant to the issue or bugzillas purpose whatsoever.
If anyone is using an unreleased product on anything other than a test environment with anything other than test data, then the nicest thing anyone could do for them at this point is to simply point out the bug has been fixed.
RedHat is pleased to announce the general availability of our new RedHat Enterprise Data Recovery service. Please contact your account manager for details.
It's already fixed, plus there are many private comments which you cannot see (but I think you see "missing" comment numbers). Also the product in question is not released yet. It's good this was found, but no customer would have been affected unless they were using an alpha.
RH don't have a great reputation here. Unlike Debian which does proper triage and practices "zero release-critical bugs", RH threw out RHEL7 with loads of critical issues still open.
All high severity bugs against 7.1 which was relased 16 days ago. Check the dates on half of them. They're before the release date and half of them haven't even been assigned or triaged.
When 7.0 came out, datetimectl and systemd didn't even work properly. Enabling ntp threw dbus errors galore. On some kit it didn't even boot. Total lemon.
RHEL doesn't generally work properly until the .2 releases. I've been using it for 10 years so I've got plenty of experience on the matter.
I would go into detail about the CIFS/smb kernel hangs I've had on 6.x but I've had enough of it by now.
The priority fields are set by developers so they know which bugs they should work on first. The two bugs of mine which appear on that list are both new features for RHEL 7.2. I set the priority of those so I know to work on them first. I really think you need a better query than that one.
Update: I think if you wanted to find out which critical bugs affected RHEL 7.0 on release, you'd probably want to look at the list of z-stream packages (RHEL 7.0.z) which subscribers have access to. These are bugs which didn't affect the installer or first boot, but were important enough to need fixing in RHEL 7.0 after it went out. (If a bug was critical enough to affect installation or first boot, it would have delayed the release).
And this is why every bug tracking system should have a triage SLA.
Or, even if you don't want to declare a threshold, publish the current stats on its front page: "Over the last 30 days, our 99-percentile triage wait time was: XX hours."
Similarly, open tickets with priority=urgent should never go 24 hours without a new comment from the owner.
"Or, even if you don't want to declare a threshold, publish the current stats on its front page: "Over the last 30 days, our 99-percentile triage wait time was: XX hours."
Now there's a good idea. Major open source projects should have software quality dashboards tracking things like that.
We're now seeing hospital emergency rooms displaying their current wait time in minutes on billboards.
I find the /etc/httpd/logs symlink more annoying. If you want to grep through your Apache configuration you have to explicitly grep through conf and conf.d otherwise just going to /etc/httpd and doing a grep -r you're searching through gigs of Apache logs.
grep -r shouldn't follow symlinks, -R does however:
-d, --directories=ACTION how to handle directories;
ACTION is 'read', 'recurse', or 'skip'
-D, --devices=ACTION how to handle devices, FIFOs and sockets;
ACTION is 'read' or 'skip'
-r, --recursive like --directories=recurse
-R, --dereference-recursive likewise, but follow all symlinks
As opposed to Debian, the distro which chooses to break tomcat (a program which unzips into a single folder and is thereby completely self-contained) up into a million different pieces and scatter them randomly all over your hard drive?
To be fair to Debian, that's how it's supposed to work. You untar into /opt, self contained, while your apt install puts things in the Filesystem Hierarchy Standard, which means config goes into /etc.
You'll find most distros follow some FHS standard, although there's some differences in interpretation.
Yes, it's a distro where you can find EVERY setting on /etc, even if the software creator decided you should know to look somewhere else.
(That said, making the package self contained is the most sensible way for the developer to release it. It's just not a good option for a distro package.)
I have no direct information about this specific case, but in ye olden Unix™ days, there was no /sbin, so all those binaries instead lived in /etc. The Red Hat symlinks could be a backwards-compatibility thing.
[0] https://github.com/valvesoftware/steam-for-linux/issues/3671