GNU grep 2.12 changes behavior of recursion options, breaks existing scripts

js2 · on July 26, 2012

Here's the change and it's justification:

http://git.savannah.gnu.org/cgit/grep.git/commit/?id=c6e3ea6...

Change -r to follow only command-line symlinks, and by default to read only devices named on the command line. This is a simple way to get a more-useful behavior when searching random directories; the idea is to use 'find' if you want something fancy. -R acts as before and gets a new alias --dereference-recursive.

Personally I think breaking compatibility for this change was a poor decision.

akkartik · on July 26, 2012

That link doesn't work. I thought at first that it was incorrectly copied, but you can see it here:

http://git.savannah.gnu.org/cgit/grep.git/log/?qt=grep&q...

And clicking on it says 'No repositories found'. Anybody else have this problem?

js2 · on July 26, 2012

Alternate link:

http://repo.or.cz/w/grep.git/commit/c6e3ea61d9f08aa0128a0eb1...

(Yea for the "D" in DVCS.)

Also, the corresponding bug which has additional discussion:

http://savannah.gnu.org/bugs/?17623

Ralith · on July 26, 2012

Just FYI, "Yea" and "Yay" are not synonymous.

js2 · on July 26, 2012

Doh. I see I mistyped "its" up above as well. (Or maybe iOS did that for me.)

cschramm · on July 26, 2012

Yes, seems like Savannah can't completely handle HN ;)

akkartik · on July 26, 2012

The error message doesn't suggest this is a load issue. Also, wouldn't the entire app go down rather than a single link?

cschramm · on July 26, 2012

If it's not a load issue and neither some intended limitation to prevent load issues, somebody pulled it on purpose. ...which I actually included in not being able to handle HN. ;)

cschramm · on July 27, 2012

The commit details are back now. :)

morsch · on July 26, 2012

the idea is to use 'find' if you want something fancy

I don't get it, can someone explain? It also appears in the ML thread linked elsewhere, and I don't understand it there either. (I understand the reasons for the change in behaviour, just not this specific line.)

bluetech · on July 26, 2012

find is essentially the command line tool for traversing the file system hierarchy recursively. It's got a million options, e.g. whether to follow symlinks, depth, file/directory/device/.., user, group, change time and many more. When you use grep -r, it only does the most common thing which is to just go over all the files. If you want something more "fancy", you can use find, e.g.:

  $ find -name '*.c' -type f -exec grep printf '{}' +

a3_nm · on July 26, 2012

Relevant thread on the ML:

https://lists.gnu.org/archive/html/bug-grep/2012-03/msg00028...

larrik · on July 26, 2012

The biggest problem I see is: who the heck would expect a .12 release to break compatibility in an ancient and solid piece of software?

sneak · on July 26, 2012

Why is grep being modified at all? It is clearly not broken.

Aglet · on July 26, 2012

To quote a recent Coding Horror post, its bugs are now Common Law Features, and must be supported.

buster · on July 26, 2012

Who thought it might be a good idea to break -r (most likely the most often used option) and use the old behaviour for -R?

Wouldn't it be better to atleast let -r behave like ever and change -R?

Anyone ever used -R here? :)

_delirium · on July 26, 2012

-R is the POSIX-standard flag for recursive grep, so that would be worse to change imo. It's also the flag used for recursive grep by the BSDs (some of which do support '-r', but only as a deprecated historical option... OpenBSD calls it "strongly discouraged").

eschulte · on July 26, 2012

Thanks for these important missing bits of information, in light of these I think the change sounds much more reasonable.

/me updates rgrep alias to use -R

ineedtosleep · on July 26, 2012

Currently -r and -R are essentially the same flags, correct? And this new change would make -R and -r two different operations?

0x0 · on July 26, 2012

Yeah, this sounds really misguided. If they change/add functionality, why not move that to a new option? Those who want the new way can use the new feature, and scripts won't act differently depending on which version is installed.

New features should get new options; they should not move old features to new flags and put the new features on the old flags. Nuts.

rmc · on July 26, 2012

Increasing the number of options and switches has downsides, it makes the software more complex to use.

0x0 · on July 26, 2012

Hehe, you know what makes the software more complex to use? Having to grep in the output of "grep --version" before deciding which switch to pass to grep.. :)

michaelhoffman · on July 26, 2012

You don't need to do that, just always use -R :)

nitrogen · on July 26, 2012

We'll get right on inventing that time machine so everyone can go back in time to the introduction of the -r flag and warn themselves ;).

rbanffy · on July 26, 2012

And note that it will make your code more compatible with other, POSIX-compliant, versions of grep.

freehunter · on July 26, 2012

When I read the first post in the chain, I thought "well, it has to be a bug". Then I read the second post... wow, it was intentional. I've used -r and -R both in the same script to do the same thing, just depending on how I was feeling that day. Now I'm afraid to update.

I don't think distros should be afraid to break compatibility with main if main is making a change that makes no sense. Breaking essential and classic *nix functions defeats the purpose of CLI utilities.

pif · on July 26, 2012

> I don't think distros should be afraid to break compatibility with main if main is making a change that makes no sense.

Hi, I don't agree. Different versions of a tool showing different behaviour for the same option is enough for me; the same version of the tool showing different behaviour for the same option _depending on the distribution_ ... I feel that's too much!

freehunter · on July 26, 2012

Well, like the link states this already breaks compatibility with BSD grep. Combine that with breaking compatibility with previous version, breaking previous scripts, and breaking based on distro depending on the speed at which the package maintainers upgrade (if ever)... it's best IMO to just leave it the way it always was.

It's going to be hard enough switching between a machine that only gets critical updates and a machine with the same distro but getting all updates. Grep has been around since 1973, is there any serious Unix scripter who feels it still needs more features?

leif · on July 26, 2012

Yeah, for habitual reasons I usually use -R. No idea why.

lubutu · on July 26, 2012

In POSIX -R is the recursion flag for ls, cp, rm, etc. So if you want to recurse, -R is probably a safer bet than -r. (ls -r just lists in reverse.)

buster · on July 26, 2012

Yeah, that may be it.. I'm wondering why they didn't just add a new option (aka --no-symlinks) or something

a3_nm · on July 26, 2012

Annoying exception: scp only accepts -r as recursion flag, not -R.

wildmXranat · on July 26, 2012

Same here. I've been conditioned to use -R

mturmon · on July 26, 2012

Good point. Also, since you reminded me, chmod, chown, chgrp.

leif · on July 26, 2012

that's probably it

adavies42 · on July 26, 2012

ever work on a non-gnu box? (old solaris, *bsd, stock os x, etc.) there's a decent chance they don't have -r at all.

cschramm · on July 26, 2012

*BSDs have (=> OS X probably too).

Solaris, HP-UX and OpenServer have not.

AIX is weird:

> -r Searches directories recursively. By default, links to directories are followed.

> -R Searches directories recursively. By default, links to directories are not followed.

rbanffy · on July 26, 2012

Isn't the AIX behavior the one POSIX specifies?

cschramm · on July 27, 2012

Looking at MattJ100's link (http://pubs.opengroup.org/onlinepubs/009695399/utilities/gre...), it seems like POSIX grep actually does not have any recursion.

Hence, only lubutu's comment ("-R is the recursion flag for ls, cp, rm, etc") is true, while _delirium ("-R is the POSIX-standard flag for recursive grep") is wrong, which means AIX is free to do whatever it wants with both the -r and the -R flags.

lmm · on July 26, 2012

That would be kind of perverse given how cp behaves for -r and -R

cschramm · on July 26, 2012

I've always used -r before, since it saves a key press. ;)

But I started using -R today, in the middle of a WTF event while inspecting some Python files, digging up a Debian bug report, inspecting the upstream change and posting this to HN.

I still think it's not the best idea for Debian (or any other system) to break compatibility with upstream, since this will lead to different behaviors on different systems, not only depending on version and vendor (GNU or *BSD). But of course, it could also lead to the change being reverted, which would be welcome (guess nobody relies on the new behavior yet).

The POSIX argument is definitely valid, but nevertheless GNU's grep has always supported -r and BSD versions do as well (although OpenBSD's man page reads "This implementation supports those options; however, their use is strongly discouraged."), so it just unnecessarily breaks existing stuff.

jasomill · on July 26, 2012

I've always use -R, presumably because it's consistent with other commands like ls, rm, and cp (and consequently get annoyed by tools like scp that only support -r, and zip, that use -R for something else entirely); I wasn't even aware that -r was supported until I saw this.

emmelaich · on July 27, 2012

I've only used -R, because of it's posixness. Finger memory from adminning Solaris.

Not to say that this is a good or bad change.

-R matches the -R in chown/chgrp

They should really introduce -r for those two to NOT follow symlinks. Following symlinks for those is bad.

cschramm · on July 26, 2012

Another thought: Since rgrep is an alias for grep -r and grep -r has changed, the complete rgrep tool has changed and this can _not_ be fixed by using the "right" switch. Hence rgrep is useless for scripting now.

pooriaazimi · on July 26, 2012

This is a serious question: Why would you want to use plain old "grep" instead of "ack"[1]? Of course, other than the fact that it's on all machines. Why would you use it instrad of "ack" on your own machines? That's an honest question, I'm not starting a flamewar... The highlighting and filename/line# by default is the killer feature for me.

Edit: Come on. Downvotes for this? Honestly... It's HN, not StackOverflow. You don't mark questions as "off-topic" unless they're trolling...

[1]: http://betterthangrep.com

MattJ100 · on July 26, 2012

The main reason to use grep is portability/availability. While a modern improvement to grep (yes, I'm an ack user and fan too) you can't simply depend on it in portable scripts and such.

grep is defined for POSIX after all: http://pubs.opengroup.org/onlinepubs/009695399/utilities/gre...

I also suspect (without looking) that there are obscure (but occasionally useful) grep features that ack doesn't support.

greyboy · on July 26, 2012

That's a fine recommendation for machines that you solely work on, or have complete control over. However, it's good to be able to work with the standard toolset if you frequently work on a variety of remote machines (where it's quite common that you cannot install such things, due to permissions or policies, and want to get started working before attempting to download/install a bunch of custom binaries).

Edit: MattJ100 made a more clear point while I was responding.

pooriaazimi · on July 26, 2012

Thanks for the response. However, a nice thing about ack is that it's not a binary (necessarily) - it's a perl script and can be used without sysadmin permissions: http://betterthangrep.com/install/

The other points are quite understandable and correct though. It's always best to at least be familiar with standard tools, even if you want to use an slightly different version for yourself.

greyboy · on July 26, 2012

I think we're probably mostly in agreement, even if I wasn't clear about it. In my line of work, there have been times where I was not permitted to use external scripts or binaries (highly sensitive environments). However, that one small example doesn't mean ack should be disregarded!

a3_nm · on July 26, 2012

ack-grep is a good tool to grep through files. grep is more than that. When I need to grep in a pipeline, I probably don't want ack-grep's bells and whistles.

One example of a useful feature of grep: "grep --line-buffered" to grep with no buffering in a pipeline.

benatkin · on July 26, 2012

It's called ack. Ubuntu gave its package and executable a different name because it conflicted with something far less notable (poor choice IMO). I suggest adding an a alias or a symbolic link so you can call it by its proper name when typing in commands.

haakon · on July 27, 2012

Ubuntu didn't; debian did. Ubuntu just inherits that. Perhaps eventually they will fix it, which they did for git (which was once git-core because of a conflict with a less notable package).

audiodude · on July 26, 2012

I love ack, I've never looked back.

ghshephard · on July 26, 2012

I've been a systems administrator of one form or another for 19 years, spend 2-3 hours a day on the CLI, and I've never heard of ack. Depending on the time of day (and platform) i might use grep or egrep. I hopped onto one of the random ubuntu boxes I own, did a quick "apt-get install ack; man ack" - here is what it said:

"ACK is a highly versatile Kanji code converter. ACK can do reciprocal conversion among Japanese EUC"

This is probably why I don't use ack - never heard of it, not available on any system I use, and the dpkg repository has something that has nothing to do with grep.

Enough of an answer?

ksherlock · on July 26, 2012

debian installs it as "ack-grep".

eliasmacpherson · on July 26, 2012

ack has ignore built in for certain filetypes. Then combined with my shell expanding * to just the files and directories at the current layer, ack will not find certain things that grep -R string * will.

pbhjpbhj · on July 26, 2012

>Why would you want to use plain old "grep" instead of "ack"[1]? //

First time I've heard of it. Thanks. Only been grepping my way around for the last dozen years or so ...

jasomill · on July 26, 2012

Because it motivates me to avoid revision control systems that insist on crudding up every directory in my source tree?

csense · on July 27, 2012

> Why would you want to use plain old "grep" instead of "ack"?

1. Because all the cool kids are doing it.

2. Because I hate Perl.

koenigdavidmj · on July 27, 2012

Why is your Perl hate relevant? It can be written in COBOL for all I care; I'm just using a tool to find files. I don't spend my days browsing through the source code of random tools.

rwos · on July 26, 2012

It's funny how one of the relatively few design errors in Unix shells now indirectly comes back to haunt us. Recursing is built into pretty much every command that can handle multiple files - which very strongly suggests it should have been made a feature of glob (or the shell).

I think it's a bit sad that those "big-picture" features in unix are treated as if the were written in stone.

nvarsj · on July 27, 2012

This is one of the great reasons to use zshell. :-)

  grep -in **/*txt "sometext"

JoachimSchipper · on July 26, 2012

Remember: GNU is not UNIX.

m0skit0 · on July 26, 2012

Backwards compatibility is an evil illness that sometimes must be broken. It's for the good of evolution. I praise engineers that make such decisions, even if they are unpopular.

TillE · on July 26, 2012

When there's a clear benefit, that's great. Feel free to scrap backwards compatibility when there's significant progress to be made in doing so.

This feels a lot more like a "color of the bike shed" choice. There are use cases where the new behavior makes sense, sure, but there are also plenty of cases where the old way is better. This isn't an upgrade, it's a lateral move.

tinco · on July 26, 2012

So we have to continue with an ugly bikeshed for the rest of eternity? I think it's very important that whenever it is decided that objectively one way of doing it is better than another way that eventually that way finds itself to be the way it is.

You basically imply that this change is not big enough to warrant a break with backwards compatibility, but small discontinuities and hacks add up. If thinks like this aren't fixed every once in a while the system will be ridden with inconsistencies.

Besides, even though it hurts when you've inherited some crazy unreadable code that utilizes some obsoleted functionality I think it is always positive for the code quality when the chaos monkey comes around and breaks something.

edit: the downvote button is to indicate I am detrimental to the discussion, the reply button is for when you disagree with me :)

sdoering · on July 26, 2012

>> edit: the downvote button is to indicate I am detrimental to the discussion, the reply button is for when you disagree with me :)

seconded

What I do not understand is why (so it feels) recently, a lot of disagreement is done via the downvote, if someone actually just rationally states his/her opinion.

So back to topic...

warmfuzzykitten · on July 26, 2012

I disagree. This is not a "fix" it's an incompatible change to a widely used option for no better reason than a programmer thought the different behavior would be a useful addition to grep.

And yes, we have to continue with an ugly bikeshed for the rest of eternity, because a) beauty is highly subjective, and b) the color of the bikeshed is less important than breaking existing software on billions of computers worldwide.

uxp · on July 26, 2012

If this actually fixes a legitimate problem (which is what?), I don't think anyone thinks that it shouldn't be fixed, just not at a x.12 release.

This changes breaks rgrep, thus it should be held until a full version (3.0).

edit: my FreeBSD's `grep` manpage:

    -R, -r, --recursive
           Read all files under each directory, recursively; this is equiv-
           alent to the -d recurse option.

tinco · on July 26, 2012

I think many people think it should not be fixed, even though it has a use case which is mentioned elsewhere in the comments.

I absolutely agree than any changes to interface like this should be in clearly defined milestone releases, possibly allowing patches to be backported to legacy versions.

koenigdavidmj · on July 26, 2012

FreeBSD grep is just GNU grep, it appears, up to but not including 9.0. Note the long --recursive option in the manual page snippet that you posted; no properly God-fearing BSD program would support long options.

jasomill · on July 26, 2012

The new non-GNU grep in FreeBSD 9 was designed to be compatible with GNU grep (pedantically, with GNU grep configured with --disable-perl-regexp as it has been in previous FreeBSD releases) because lots of ports depend on GNU grep behavior. The fear of God is not a sufficient reason to break ports.

cturner · on July 26, 2012

You'll feel differently when you inherit responsibility for some god-awful mess of shell and perl that nobody properly understands that used to Just Work but doesn't after an innocent upgrade.

droithomme · on July 26, 2012

Grep is used in hundreds of millions of scripts ranging from mundane rarely used things to install scripts to critical system functionality run every few minutes.

It's not acceptable to break the default way it works under any circumstances.

jasomill · on July 26, 2012

I agree in principle, though to be fair most UNIX utilities have so many incompatible variants that striving for maximum backwards compatibility often does more harm than good (like producing options that do entirely different things in the presence of other options, in the absence of a leading hyphen, etc.). Practically speaking, sysadmins generally use more consistent tools (e.g., Perl) for nontrivial cross-OS things once this becomes an issue, and, as a developer, it's not clear to me that build-time dependence on, say, Perl or Python is any worse than depending on GNU versions of basic UNIX tools, the existence of which tends to only be a safe assumption on Linux. On non-UNIX platforms, this is an even bigger issue: I'd certainly rather recommend ActivePerl or the latest binary Python 2.7.x release from python.org to the average Windows developer than any of SFU, Cygwin, or MSYS (and I say this as someone with a strong UNIX background who works with both Windows and Windows developers on a daily basis).

guard-of-terra · on July 26, 2012

grep doesn't need any more evolution. grep is the final result of the evolution.

snorkel · on July 26, 2012

I also decided that sed no longer supports regexes because I said so. And ls -l will now shows a list of print jobs instead of files because maybe that's what you meant. ... oh and ping no longer supports IPv4 because I want everyone to adopt IPv6 immediately because I said so.