Why GNU grep is fast (2010)

jcbeard · on Aug 24, 2016

If you use grep in OS X, and you use it a lot (even called through other programs/libraries) it pays in time to install GNU grep vs. the default OS X provided one. It's several times faster. Ran into this when benchmarking grep + GNU Parallel on OS X vs. Linux in my thesis (raftlib modeling) work. Results on my small MacPro cluster didn't match those on Linux. The OS X runs were orders of magnitude slower despite almost identical hardware configurations. The culprit as it turned out was largely the implementation of grep. I compiled with the GNU version, re-ran and the rest of the mis-match was simply due to the lack of POSIX features like thread pinning, and NUMA control.

<stepping up to soap box/> So yeah...as usual, algorithm matters. Implementation matters, and well....knowledge of the architecture/data-movement matter. These days the architecture/data-movement knowledge matters almost as much or more than the algorithm. Picking an algorithmically faster implementation that moves data more (e.g., tree traversals usually have poor cache behavior, more bursts required, etc.) is often the case b/c programmers don't really understand how the computer works, but are great at coding O(n logn) algorithms. We shouldn't stop teaching algorithms, but data-movement/access likely needs to be emphasized along with the understanding of algorithms.

jasonpeacock · on Aug 24, 2016

If you use grep a lot, you should be using 'ag' (Silver Searcher): http://geoff.greer.fm/ag/

happyslobro · on Aug 24, 2016

Or ack (http://beyondgrep.com)

loeg · on Aug 24, 2016

ag was initially written as a faster ack. Maybe ack has caught up since, though.

takeda · on Aug 24, 2016

I doubt, ack is just a perl script while ag is written in C.

kazinator · on Aug 24, 2016

If you use grep a lot over a code base, you should index it with "mkid" and use "lid" (gnu id-utils). Quasi-instant results.

mastre_ · on Aug 25, 2016

Absolutely, more or less grep -> ack -> ag. Be warned tho, sometimes ag (and IIRC ack as well, but haven't used it in a while since switching to ag) won't find some results due to its default filtering rules even tho certain results should be included. For critical things, I end up double-checking results w/ grep -r which always finds everything, no questions asked.

amelius · on Aug 24, 2016

This makes me wonder: why would Apple implement its own version of grep, and not simply ship GNU's version?

jcbeard · on Aug 24, 2016

See article (http://meta.ath0.com/2012/02/05/apples-great-gpl-purge/)from previous HN post: https://news.ycombinator.com/item?id=3559990

Also, GNU's bulletin on Apple Boycott: https://www.gnu.org/bulletins/bull18.html#SEC13

And also this article which explains why apple is BSD mostly: https://scalibq.wordpress.com/2012/08/02/apples-os-x-is-not-...

ccallebs · on Aug 24, 2016

Your first link is redirecting to what I assume is a paid blog post. It's advertising and providing extensive backlinks for a computer desk.

TAForObvReasons · on Aug 24, 2016

http://meta.ath0.com/2012/02/05/apples-great-gpl-purge/ is the correct link, parent needs to put a space before the close paren

raw text: http://hastebin.com/raw/zohoxaxuyi

jcbeard · on Aug 24, 2016

Awesome, thanks for posting/fixing!

pdpi · on Aug 24, 2016

Because Apple avoids GPLv3 like the plague, basically. That's why the included bash is so old too, AFAIK

colejohnson66 · on Aug 24, 2016

Which in this case is ridiculous. There are valid reasons for businesses to avoid GPL, but if they use GNU grep, the only thing they have to open source is their copy of GNU grep. It's not like doing so will force them to make their kernel GPL

dragonwriter · on Aug 24, 2016

> There are valid reasons for businesses to avoid GPL, but if they use GNU grep, the only thing they have to open source is their copy of GNU grep.

If they use GNU grep as part of a distribution that might be included with hardware, it means the GPLv3 anti-tivoization clauses affect any hardware that that distribution is on. Apple being first and foremost a consumer hardware manufacturer whose software exists to add value to that hardware, and wanting to maintain its ability to do what it wants with that hardware including not being constrained by the GPLv3's anti-tivoization clause (even if they aren't doing anything now that would fall afoul of it, they might want to in the future, and don't want to tie their hands in advance), has a very good reason not to touch GPLv3 at all.

(There's other GPLv3 provisions that might be problematic, too, GPLv3 involves more than just a requirement to provide source also licensed under the GPLv3.)

belorn · on Aug 24, 2016

Companies that want to remain in ownership control of property after they have sold it can't use GPLv3 licensed software, that is true. They could lease out the hardware to achieve the same effect, but for some reason some companies prefer to disguise the lease as a sale.

dragonwriter · on Aug 24, 2016

> They could lease out the hardware to achieve the same effect

No, they could not. The GPLv3 requires modification information be provided to the "user" by whom a consumer product is "received", not only the "owner" by whom it is "purchased", so lease vs. sell has no effect on the obligations (at least no effect clear from the text of the license.)

belorn · on Aug 25, 2016

Reading the sources to that, it seems the issue is extremly unclear. The big question is if you need additional copyright permission to lease out hardware that include software.

For example. A car rental service buys a modern car that has software in it. Does they need additional copyright permission on top of the purchase in order to lease the car out?

A taxi driver buys a modern car with the intention to rent out his service with the car. Is additional copyright permission required?

A school provides students with access to the school computers. Is that conveying a copy of software? What if its "open door day", where the public is invited?

A public library want to have public available computers. Are they conveying copies of software through that?

If you looked at WIPO, it explicitly say no. If you looked at Information Society Directive 2001/29/EC, they explicitly say yes, under the condition that the offer is directed to the public. If you looked at the court cases that has interpreted those texts, then they are inconclusive.

dragonwriter · on Aug 25, 2016

> Reading the sources to that, it seems the issue is extremly unclear. The big question is if you need additional copyright permission to lease out hardware that include software.

If its intended as a "Consumer Product" as defined in the license, you need copyright permission to put the copy of the software in the product; at that point you either except the condition of providing the required information and functionality to the user to whom you will eventually provide the product -- whether by rental or sale -- or you violate the GPLv3 by putting the copy in the device.

Once you have accepted that condition, you are bound by it.

> For example. A car rental service buys a modern car that has software in it. Does they need additional copyright permission on top of the purchase in order to lease the car out?

They don't need "additional copyright permission", but if the initial copyright license accepted when acquiring the car has restrictions that apply to the use of the software in rentals, then they are bound by those restrictions.

detaro · on Aug 24, 2016

I'd assume their policy is because of the Tivoization clauses, which (as far as I understand them) would become an issue if they lock the OS down more, or want to share the environment with e.g. iOS devices.

CJefferson · on Aug 24, 2016

Apple is moving towards forbidding write access to system directories, for example:

    sudo mv /bin/echo /bin/e
    mv: rename echo to e: Operation not permitted

I'm not sure if this is getting close to falling foul of the GPL v3 -- if I can't replace these files, is that a problem?

spullara · on Aug 24, 2016

That is System Integrity Protection implemented in the latest operating system: https://support.apple.com/en-us/HT204899

It is possible to disable it and even if you couldn't I don't think it has any impact on licensing.

dragonwriter · on Aug 24, 2016

If you couldn't, it would conflict with the "anti-tivoization" rules applicable to "User Products" in the GPLv3, which requires that any GPLv3 code incorporated into a "User Product" is replaceable with modified versions, and that the manufacturer of the product provide all information necessary to the user to make that replacement.

spullara · on Aug 24, 2016

Hmm. Isn't it replaceable by changing the PATH and installing them in non-System Integrity areas of the operating system like /usr/local/bin?

colejohnson66 · on Aug 24, 2016

Is there anything stopping you from booting into Linux to rename it, is there? Like how I can't delete files owned by SYSTEM on Windows, but can on Linux?

koenigdavidmj · on Aug 24, 2016

You can even unset this within macOS; it just requires a reboot:

1. Hold Cmd-R as you reboot. That puts you in recovery mode.

2. Bring up a terminal.

3. Run 'csrutil disable'

4. Reboot back into regular mode

cmurf · on Aug 24, 2016

Yes. Linux has no support for Core Storage, which is now used in default macOS layouts. So any hfsplus volume on Core Storage is rendered effectively not readable on an OS other than macOS, even if it's not encrypted.

pohl · on Aug 24, 2016

Rather than replacing it, why not write it under /usr/local and edit your PATH?

CJefferson · on Aug 24, 2016

Oh, I certainly could (and do for bash), but I believe the GPL v3 requires that you can replace any copies of programs with your own, which possibly Apple might worry they fall foul of, which might be why they don't want GPL v3 software (I'm guessing).

jlarocco · on Aug 24, 2016

Actually, a strict "no GPLv3" policy makes a lot of sense because it side steps the whole issue and means they don't have to fork over a ton of money for legal reviews of something 95% of OSX users won't care about.

It's easy enough for people who want GNU grep to install it themselves.

mikekchar · on Aug 24, 2016

grep was already there from BSD, so no reason to add something new. However, speaking as a guy who started out on Sun BSD machines, the very first thing we did when we got a new box was to replace all the BSD utilities with GNU ones. Virtually all of them are better.

microtonal · on Aug 24, 2016

grep was already there from BSD

FYI: OS X's grep is FreeBSD's bsdgrep. However, FreeBSD's default grep is still an ancient version of GNU grep (bsdgrep is only available as bsdgrep.).

Also, older versions of OS X used to have GNU grep [1]. So the more likely explanation is Apple's purge of GPL code from OS X. Since later grep versions use GPLv3, GNU grep was a dead end for them.

Unfortunately, FreeBSD's bsdgrep has some very annoying bugs (that are also present in OS X). No one seems to care about them, even when a patch is provided [2].

[1] http://opensource.apple.com/release/mac-os-x-1075/ [2] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=201650

emaste · on Aug 24, 2016

The patch in [2] was added a bit over a month ago as a comment in a bug report about the GNU grep in the base system, so I don't think it's a good example of no one caring about bsdgrep bugs.

On occasion I search for bsdgrep bugs, and did not find the comment / patch in [2]. Thank you for pointing it out, I'll look at it when I can.

microtonal · on Aug 24, 2016

The patch in [2] was added a bit over a month ago as a comment in a bug report about the GNU grep in the base system, so I don't think it's a good example of no one caring about bsdgrep bugs.

I FreeBSD developer requested that I added a report to that bug [1]. After adding the patch to the report, I also reported this to the list [2]. I have also tried to contact the maintainer of bsdgrep with no luck. At some point you just give up ;).

[1] https://lists.freebsd.org/pipermail/freebsd-questions/2016-J... [2] https://lists.freebsd.org/pipermail/freebsd-questions/2016-J...

emaste · on Aug 24, 2016

I'm sorry this issue fell through the cracks. Feel free to add me (emaste) to the CC on any bsdgrep PRs you raise or encounter.

microtonal · on Aug 24, 2016

Thanks! I might triage some of the other bsdgrep issues and will take up on your offer if I do so. Sorry for the bluntness in my first comment!

tedunangst · on Aug 24, 2016

Great. Now I've got more work too. :)

jlarocco · on Aug 24, 2016

Apple uses BSD grep for basically the same reason FreeBSD uses BSD grep: GNU grep is GPL, and new versions are GPL v3.

loeg · on Aug 24, 2016

They ship an old version of BSD grep, presumably. Not their own implementation.

archagon · on Aug 24, 2016

My knowledge of architecture/data-movement is woefully inadequate. Where could someone go to brush up on this stuff?

beachstartup · on Aug 24, 2016

i've seen this getting worse with your typical developer in the age of "push the button to get another virtual machine and someone else will pay the bill."

rusanu · on Aug 24, 2016

Previous discussions:

- https://news.ycombinator.com/item?id=1626305 (115 comments, 2193 days ago)

- https://news.ycombinator.com/item?id=2393587 (68 comments, 1972 days ago)

- https://news.ycombinator.com/item?id=2860759 (43 comments, 1842 days ago)

I'm sure there are more.

throwawayReply · on Aug 24, 2016

1842 days is over 5 years. At some point you have to consider the set of users here now is different to (if not distinct from) the set of users here 5 years ago.

dalbin · on Aug 24, 2016

It's useful for the new users too because previous discussions has a lot of comments to read.

By the way, it would be great if there would be a "Related" widget with the number of comments somewhere in the comment page.

curiousgal · on Aug 24, 2016

Clicking on the domain name is close to that.

goldenkey · on Aug 24, 2016

As they say, close but no cigar. I come to the comments and find previous discussion because of a helpful user. Not because I knew to look for it.

TobbenTM · on Aug 24, 2016

I don't think he meant anything by the comment, other than simplifying access to more (older) comments. ;)

rusanu · on Aug 24, 2016

Correct. I've read the original post years ago, I do find it it fascinating, and I would like every new generation to read it. I don't have a quarrel with it making HN front page every now and then.

ffggvv · on Aug 24, 2016

Probably this is why he put here those links, to redirect the new set of users to discussions from the old set of users.

chukye · on Aug 24, 2016

I think this was already quoted in somewhere, but, its the best advice;

"The key to making programs fast is to make them do practically nothing."

blowski · on Aug 24, 2016

That's what separates good and bad teams (or individual developers): the wisdom of how to make a program that does the bare minimum while actually solving the problem.

nxzero · on Aug 24, 2016

"Make things as simple as possible, but not simpler."

setra · on Aug 24, 2016

He is saying make the program spend less time computing. Thats what he means by not looking at every byte. The way he does that is by using more complicated algorithms. It's not simple.

nxzero · on Aug 25, 2016

Quote is often credited to Einstein; being "simple" does not mean something is not complex, it means that there's no unnecessary complexity; being "too simple" means necessary complexity was remove.

Basically means only do exactly what's required, nothing more, nothing less.

mozumder · on Aug 24, 2016

Also useful: "If you want to make a fast program, use a slow computer."

CaptSpify · on Aug 24, 2016

I used to work at a place that had an intentionally slow dsl line, and a ~5 year old, beat up pc to test their code on. If it worked there, it would work for their customers.

I wish more places would try this. Sure, your app works in downtown SF, but if your customer is in Nowheresville, AL, you should probably spend more time optimizing.

peruvian · on Aug 24, 2016

I wish more web developers would test their sites on 3G connections with Australian latency :-)

sliverstorm · on Aug 24, 2016

The difference between 3G & 4G both in speed and latency is profound, and nobody seems to design with 3G in mind, so I find from a data standpoint 3G is practically worse than nothing these days.

All of your apps will say "I have data access!" and send a million requests for tens of megabytes, which will never finish, melting your battery as it waits on timeouts and completely choking the access you do have preventing you from doing anything useful.

drakonandor · on Aug 24, 2016

Why encourage technology lag and bad ISP behavior?

takeda · on Aug 24, 2016

Because programmers are lazy.

Why making program faster when it works fine on their top of the line desktop machines?

CaptSpify · on Aug 24, 2016

Because that's the lazy answer

laumars · on Aug 24, 2016

Not really necessary in most cases. There's enough profiling tools available that can give response times in fractions of a second and in many cases you can add repetition to inflate the few processes that are still too small to measure. These will give you far more meaningful figures about the performance of your code than anecdotal evidence using a slow PC as a benchmark.

dr_zoidberg · on Aug 24, 2016

I agree with both of you. Once I optimized a major Python project to avoid some bad disk access patterns. Nobody noticed them because everyone was using SSDs and those patterns didn't hurt (too much) performance on those, but I was running on an HDD and I had run times of hours (vs a couple of minutes for everyone else).

But my optimizations were guided by the profiler and a (more or less) correct analysis of the results.

adrianN · on Aug 24, 2016

Developers care more about performance if it's slow on their computers too.

kochthesecond · on Aug 24, 2016

Haha, this is true. Yet i still need to have as fast build and feedback as possible when I'm doing work.

StillBored · on Aug 24, 2016

Which is one of a hundred reasons why the machine you write the code on, the machine you build it on, and the machine you test it on, should be three different machines (actually probably more than that because your test environment should be more varied than a single OS/environment).

laumars · on Aug 25, 2016

If we really want to adhere to best working practices, then it should be a separate team performing the tests based on the same specifications that the developers used to write the code (ie the testing team wouldn't perform tests based on the developers specifications but rather the project owners specs to mitigate the risk of the developers misunderstanding the specs).

However that requires a level of process that is either too costly, or too rigid for many IT companies to follow.

reacweb · on Aug 24, 2016

Yes and we underestimate how often this kind of rule can apply.

To have fast network, reduces network utilisation (for example using cache). To have fast database, reduce number of queries (for example using more complex queries). To have fast memory allocation, avoid dynamic memory or preallocate...

zamalek · on Aug 24, 2016

This also works if you are aiming for stable software - I can't find the quote or who to attribute it to:

> The only bug-free line of code is the line of code that is never written.

kleim · on Aug 24, 2016

Somehow this reminds me of this quote from Antoine de Saint-Exupéry: "Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away."

Waterluvian · on Aug 24, 2016

This is why I find myself writing fewer bugs when I'm writing Python. I write less and have less to mess up.

galdosdi · on Aug 24, 2016

This is actually why I find myself writing fewer bugs when I'm writing Java than Python. Thanks to static typing, I don't need to endlessly repeat myself writing and updating boilerplate unit tests and boilerplate documentation (and inevitably messing up)

I've heard about PEP484 and hope it catches on, but unfortunately it's for python3 and a lot of people are still stuck on python2 for many reasons.

tombert · on Aug 24, 2016

cough That's kind of why I feel like Haskell is a superior language to most others.

zamalek · on Aug 24, 2016

Too true. I read an account of some product being ported to F# (from C#) a few years back. The line count was substantially decreased (at least one order of magnitude if I'm remembering correctly) with a correlated decrease in defects. I can't find the original post, but there are plenty of supporting testimonials[1].

[1]: http://fsharp.org/testimonials/

tombert · on Aug 24, 2016

Yeah, as someone who gets to use F# fairly regularly, actually, it's a substantial improvement in the LOC department compared to Java or C#. I don't find it quite as reliable as something like Haskell (do to the fact that you don't have to be explicit about side-effects), but it's definitely better than most languages.

ComputerGuru · on Aug 24, 2016

I thought for a second this was one of my all-time favorites, which can also be summarized as "how is gnu grep so fast?" [0].

0: http://ridiculousfish.com/blog/posts/old-age-and-treachery.h...

tghw · on Aug 24, 2016

Came here to post exactly that article.

goldenkey · on Aug 24, 2016

That was great, thanks!

I am not so sure the 70/8 tradeoff is worth it but it should probably be tested with code samples as well as english corpuses. Is grep primarily made for ascii?

lazyjones · on Aug 24, 2016

8 previous posts/discussions: https://hn.algolia.com/?query=Why%20GNU%20grep%20is%20fast&s...

lubomir · on Aug 24, 2016

For those interested in the Boyer-Moore algorithm, this is an commented demonstration: http://www.cs.utexas.edu/users/moore/best-ideas/string-searc...

Compare to Knuth-Pratt-Morris: http://www.cs.utexas.edu/users/moore/best-ideas/string-searc...

hossbeast · on Aug 24, 2016

For fast grep, consider ag

https://github.com/ggreer/the_silver_searcher

kermatt · on Aug 24, 2016

Faster than Ack, not always faster than grep.

shandor · on Aug 24, 2016

Really? My completely anecdotal experience is that ag kills grep. Stuff like searching usages of variables in Linux kernel seems to take ages with grep, whereas ag search on the whole repository doesn't take annoyingly long.

I'll need to test this, just to see how wrong my "feeling" on the matter may be.

ggreer · on Aug 24, 2016

I'm the author of ag. On a single file, ag is slower than GNU grep. The main reason is that ag prints line numbers when showing matches. Printing line numbers means going through the whole file and searching for \n's. For large files, this can be very resource-intensive. Here are the numbers (median of 5 runs, caches hot) for an 8GB file:

    % time ag hello4294967296 big_file.txt
    134217728:hello4294967296
    ag hello4294967296 big_file.txt  7.83s user 0.44s system 99% cpu 8.275 total

    % time grep hello4294967296 big_file.txt
    hello4294967296
    grep hello4294967296 big_file.txt  2.17s user 0.91s system 99% cpu 3.085 total

If you enable line numbers with grep -n, grep still wins, probably because ag's printing code is ridiculously inefficient:

    % time grep -n hello4294967296 big_file.txt
    134217728:hello4294967296
    grep -n hello4294967296 big_file.txt  3.37s user 0.97s system 99% cpu 4.338 total

Unfortunately, I didn't write ag's printing code very sanely. It grew organically from users asking for small features here and there, and now it's a giant hairball. I've added some tests to reduce the chance of breaking changes, but it'll probably be a while before I fix this performance issue.

Despite these limitations, ag tends to be much faster than grep -r in real-world use. There are several reasons why:

1. Ag has a multi-threaded architecture that can search multiple files at once. Grep is single-threaded.

2. Ag ignores binary and hidden files by default. Grep doesn't.

3. Ag obeys .gitignore/agignore/hgignore. Grep doesn't.

These things really add up. For example, when searching my 20GB ~/code directory:

    ggreer@lithium:~/code% time ag instantiationname
    ...
    ag instantiationname  3.82s user 2.16s system 157% cpu 3.797 total

    ggreer@lithium:~/code% time grep -n -i -r instantiationname
    ...
    grep -n -i -r instantiationname  42.77s user 2.62s system 99% cpu 45.403 total

Both find the same matches (some tests in the Node.js source), but ag does it 12x faster.

shandor · on Aug 24, 2016

Thanks for the really detailed reply.

I guess that explains my experience really well, since I'm always searching from a big amount of small files like huge codebases and config files, and almost never from files bigger than several megabytes.

ag has tremendously improved my quality of life in that regard, so please accept my sincere Thank You for making ag :)

xorcist · on Aug 24, 2016

> Ag has a multi-threaded architecture that can search multiple files

Interesting. I would have guessed you were i/o bound there, and file systems usually give higher throughput on sequential reads. Tried it on my bog standard Linux box but couldn't get it close to grep performance wise. Can you share how you did it?

> Ag ignores binary and hidden files by default

Like grep -I? That could easily cause unexpected results. Many files are treated as binary just because they contain parts of a non native character set (such as latin-1 files on a utf-8 system).

ggreer · on Aug 24, 2016

I've documented most of ag's tricks in various blog posts.[1] You're right that sequential reads are faster than random reads, but searching a bunch of source code is going to be a bunch of short reads no matter what. Each file is on the order of 50kB, or about 0.5ms if read at 100MB/sec. That's far less than the seek time of a hard drive. I've experimented a lot, and found it best to read ASAP, then let the OS & device figure out the best order to service the requests.

Also, modern SSDs (and even HDDs) have tricks like NCQ. With caches cleared (cold reads), multi-threaded ag is 2x faster on SSDs. Disabling NCQ makes multi-threaded perform the same as single-threaded.[2] (Again, with caches cleared.)

> … That could easily cause unexpected results. …

I've received very few complaints about ag's binary detection. It does a nice job of balancing false positives, false negatives, and performance. See is_binary() in src/utils.c for the exact code[3].

1. http://geoff.greer.fm/ag#related-posts

2. http://geoff.greer.fm/2012/09/07/the-silver-searcher-adding-...

3. https://github.com/ggreer/the_silver_searcher/blob/7d52078cb...

burntsushi · on Aug 24, 2016

> The main reason is that ag prints line numbers when showing matches.

Wouldn't it be more accurate to say that ag is slower than grep because it does line-by-line searching where as grep does not? What would happen if ag adopted grep's method of searching where it doesn't do line-by-line? Is that possible with PCRE?

I think it would be very interesting to combine the tricks used by grep with the tricks used by ag. I don't think it has been done yet!

ggreer · on Aug 24, 2016

Ag doesn't search line-by-line. It's only if a match is found that it goes back and looks for newlines. This optimizes the typical case (most files have no matches) at the expense of the rare case (file contains a match).

Also, ag only uses PCRE if the query looks like a regex. Otherwise, it uses Boyer-Moore strstr.

burntsushi · on Aug 25, 2016

> Ag doesn't search line-by-line.

Really? Am I misunderstanding this code that very much appears to search line by line? https://github.com/ggreer/the_silver_searcher/blob/master/sr...

GNU grep never does "throw line into buffer and then search it." GNU grep just searches the entire buffer at once.

ggreer · on Aug 25, 2016

You didn't check to see what value opts.multiline has. It's initialized to TRUE in src/options.c. It's only set to FALSE if a user passes the undocumented --nomultiline option to ag. Default behavior with a regex query is https://github.com/ggreer/the_silver_searcher/blob/master/sr...

burntsushi · on Aug 25, 2016

Ah excellent, thank you for fixing my understanding. :-)

kermatt · on Aug 25, 2016

IIRC, Ag excels at code search, where grep can outpace it when searching things like data files.

"The Silver Searcher is a tool for searching code." (http://geoff.greer.fm/ag/)

userbinator · on Aug 24, 2016

I wonder if the simpler Boyer-Moore-Horspool might be even faster, since in practice BMH beats regular BM except in the pathological cases:

http://old.blog.phusion.nl/2010/12/06/efficient-substring-se...

Also, unrolling loops may actually slow things down in modern x86 CPUs since they will automatically decode and cache very small loops. Some interesting discussion on that here:

https://groups.google.com/d/topic/mechanical-sympathy/UFscif...

ComputerGuru · on Aug 24, 2016

Nope, or at least, not necessarily. Have fun reading this: http://ridiculousfish.com/blog/posts/old-age-and-treachery.h...

jcranmer · on Aug 24, 2016

2006 is a bit old when talking about newer Intel CPUs.

goldenkey · on Aug 24, 2016

Processor competition in the desktop market has been lackluster since that time. Intel has been sitting on their laurels mostly. The vectored primitives and bigger cache and core counts are nice but honestly you have been fooled by the branding and continual product lines.

Loop unrolling still helps even with brand spankin new CPUs. Some intent cannot be inferred.

jcranmer · on Aug 24, 2016

> Processor competition in the desktop market has been lackluster since that time. Intel has been sitting on their laurels mostly. The vectored primitives and bigger cache and core counts are nice but honestly you have been fooled by the branding and continual product lines.

I might be inclined to agree with you... if I didn't know that Haswell:

* implemented the computed goto transformation (for interpreter loops) in hardware

* made unaligned cache hits crossing cache line boundaries have no latency penalty

Both of those things are things that I would have told you were impossible(/infeasible) to do in hardware before finding out that Intel did them.

There's actually quite a lot of amazing microarchitectural changes going on in Intel processors, you just don't hear a lot about them.

primis · on Aug 24, 2016

The key to making programs fast is to make them do practically nothing.

This is the Ultimate Truth of optimizing computer programs, and it seems so few people understand it.

"Why can't you make Python faster!?" Because Python does a lot of stuff without you asking.

0xmohit · on Aug 24, 2016

> "Why can't you make Python faster!?"

You can. Use a Python wrapper to invoke a program written in another language. Python becomes faster.

danso · on Aug 24, 2016

OT, but along the same lines of how GNU has brought considerable performance improvements to what (in my naive layperson's mentality) seem like straightforward tools:

Does anyone have an insight as to why GNU tail is, in at least one situation, considerably faster than FreeBSD's tail (at least the one that comes with OS X)?

I'm referring to the use of: `tail -n +2` to skip the first line, versus using `sed '1d'`. To me, it makes intuitive sense how the latter could be faster, and on FreeBSD, it is, by at least one order of magnitude. However GNU tail is one order faster than sed. Is GNU tail using some heuristic to handle "skip the first line in a huge file" differently?

glandium · on Aug 24, 2016

I haven't read the code of neither sed nor tail, but I'd imagine sed is still reading line by line, while tail skips the number of lines it needs to skip, and then just goes on full throttle reading the remainder of the file. It might even be using sendfile(2) or splice(2).

0xmohit · on Aug 24, 2016

GNU grep is fast, but there are certain alternatives that are pretty fast (probably faster):

  - ack [0]
  - ag (also known as the silver searcher) [1]

You could make `grep` faster by telling it to interpret the pattern as a fixed string (-F or --fixed-strings) is you're not using a regex. Setting to LANG to C would also probably make it faster.

[0] http://beyondgrep.com/

[1] https://github.com/ggreer/the_silver_searcher

burntsushi · on Aug 24, 2016

I think it would be interesting to compare those utilities with GNU grep on a single file, because I understand them to be faster on many files because they (1) search files in parallel and (2) skip many uninteresting files automatically.

I also don't think either of those tools are DFA based like GNU grep. One thing I'd like to see is combine many of the performance tricks found in GNU grep with the aforementioned (1)+(2) techniques.

> You could make `grep` faster by telling it to interpret the pattern as a fixed string (-F or --fixed-strings) is you're not using a regex.

That doesn't really make much sense. GNU grep can tell whether a pattern is just a literal or not. If it's a literal, it should be able to stay out of the main DFA engine entirely.

nilved · on Aug 24, 2016

It's pretty easy to use grep and search in parallel while skipping uninteresting files. `git ls-files | xargs -P4 -n8 grep whatever` is a good way to do this; or `git grep`, but that's using PCRE and not parallelized.

> That doesn't really make much sense. GNU grep can tell whether a pattern is just a literal or not. If it's a literal, it should be able to stay out of the main DFA engine entirely.

That would be clever but it doesn't do that.

burntsushi · on Aug 25, 2016

> That would be clever but it doesn't do that.

Really? Pretty sure it does. Have you read the source code? There is a specific part that looks to see if it has a literal and will use that. If it has an exact match, then it will skip the DFA completely.

goldenkey · on Aug 24, 2016

The regex is compiled anyhow to a C-call subroutine using a lib. So the overhead is only a function call. I highly doubt a single inlined proc is going to give much speed gain.

0xmohit · on Aug 24, 2016

> That doesn't really make much sense. GNU grep can tell whether a pattern is just a literal or not.

It does help when the search string comprises of regex metacharacters and you actually wanted a literal search.

gpderetta · on Aug 24, 2016

I use grep for single files (mostly because it is burned in my muscle memory). The silver searcher (from emacs) for bulk searches in a large directory tree (like a codebase).

michaelmior · on Aug 24, 2016

While ag is fast, I use it because of added features. Specifically the ability to automatically ignore files which are ignored in your project's VCS.

agentgt · on Aug 24, 2016

It is also because it is multithreaded (IIRC). There is another one written in Go that is for sure. GNU grep is faster one file (at least the last time I tested).

asamarin · on Aug 24, 2016

You're probably talking about sift

https://sift-tool.org/

coldtea · on Aug 24, 2016

I've found pt (the platinum searcher) to be better (and even faster) than ag for codebase searches (and it can automatically pick up .gitignore etc).

https://github.com/monochromegane/the_platinum_searcher

nickcw · on Aug 24, 2016

If I'm searching a git repo I use "git grep" which seems to be very fast and knows exactly what is your source and what isn't.

foobar__ · on Aug 24, 2016

How is it better?

coldtea · on Aug 24, 2016

From their page:

It searches code about 3–5× faster than ack.

It searches code as fast as the_silver_searcher(ag).

It ignores file patterns from your .gitignore.

It searches UTF-8, EUC-JP and Shift_JIS files.

It provides binaries for multi platform (Mac OS X, Windows, Linux).

But even more so, sensible defaults that work nicely for code search use cases. And written in Go, without accumulated cruft for 200 platforms and POSIX subtleties like grep, so even someone like me can hack it to do something special if I want to.

epmatsw · on Aug 24, 2016

Ag handles .gitignore as well. And has defaults appropriate for code searching. So the difference is official Windows support?

coldtea · on Aug 24, 2016

It's also faster (30% to 50% for my codebase), written in Go, and easier to extend.

  $ time ag zmq | wc -l
      63

  real   	0m0.017s
  user   	0m0.022s
  sys    	0m0.016s

  $ time pt zmq | wc -l
      296

  real   	0m0.013s
  user   	0m0.014s
  sys    	0m0.017s

(Timing differences consistent over multiple runs of both on the same codebase -- no disc cache effect) -- the wc difference is because of different "surrounding context" setting.

ggreer · on Aug 24, 2016

I'm the author of ag. Pt has drawbacks that you may not be aware of. For example: pt is case-sensitive by default. (Ag defaults to smart-casing. If your query is all lowercase, it's case-insensitive.) If you tell pt to do a case-insensitive search, it kills performance. Here's an example:

    ggreer@lithium:~/code% time ag instantiationname
    ...
    ag instantiationname  3.82s user 2.16s system 157% cpu 3.797 total

    ggreer@lithium:~/code% time pt -i instantiationname
    ...
    pt -i instantiationname  136.65s user 0.77s system 773% cpu 17.761 total

That's to search a 20GB code directory. Both find the same matches. (Instances of "InstantiationName" in node/deps/gtest/include/gtest/gtest-param-test.h)

Another issue is that pt tends to bail on errors. If you tell it to follow symlinks (-f) and it encounters a single broken symlink, it will exit without finishing the search. Every other search tool I know of (grep, ag, ack) will keep on truckin'.

That said, pt is faster than ag for case-sensitive matches. It looks like much of that comes from a parallelized directory traversal. The architecture of ag is different. It has many worker threads, but only one thread going through dirs. Looks like I'll have to step-up my game. :)

ggreer · on Aug 24, 2016

I found another pt performance issue. Searching an 8GB file:

    % time pt -i hello4294967296 big_file.txt
    big_file.txt
    134217728:hello4294967296
    
    pt -i hello4294967296 big_file.txt  535.86s user 1.35s system 100% cpu 8:56.97 total

That's right: pt takes 9 minutes to do a case-insensitive match on an 8GB file. On the same machine doing the same search, ag takes 7 seconds and grep takes 4s.

Also, it looks like pt was beating ag because it defaults to case-sensitive search. If I make ag do the same case-sensitive search, it's slightly faster in my benchmarks (0.63s to search my ~/code directory vs pt's 0.65s).

coldtea · on Aug 26, 2016

Thanks, original parent, never occurred to me to do case-insensitive searches, but I can see them being useful in some cases.

My main beef with ag was probably due to some older distribution or worse packager (tried it a few years ago), because I tried it again now from "brew" and it's pretty much just as convenient as pt -- and with some more features.

amluto · on Aug 24, 2016

I wonder if all of these optimizations are still as helpful. For example, skipping bytes only saves memory bandwidth if you skip whole cachelines.

Also, for --mmap, there are a couple of conflicting considerations. On newer CPUs, large memory copies are very fast. With mmap, though, unless you use MAP_POPULATE, you can get a page fault every few pages, and on CPUs with exception mechanisms as bad as x86's, page faults are very slow (probably 20 times as bad as a syscall).

ksk · on Aug 24, 2016

Memory bandwidth is not the main concern here since they explicitly mention that executing more CPU instructions kill performance far quicker - in this case. As for mmap, perhaps you would like to present your own benchmark that contradicts the article.

But anyway, I highly doubt that either of the basic things you mentioned are news to the implementors. The implementation is supposed to run on all kinds of processors with varying architectures so making the implementation portable is bound to leave performance on the table. Not to mention other constraints like maintainability and meeting the performance criteria for the common use case/workload.

adrianratnapala · on Aug 24, 2016

The bottleneck is not reading the byte, it is the CPU clocks needed to figure out that it is safe to skip.

Assuming cache prefetching works well thanks to the sequential search, you can try to max out the memory bandwidth. But even DDR3 has best-case transfer rates from 6-17 GB/second. So to max it out on a 3GHz CPU, you need to eliminate 2-6 bytes on every tick of the clock.

goldenkey · on Aug 24, 2016

Does it use SIMD instructions? Some newer cpus can do up to 16 comparisons at once. That also might mean skipping will be branchy and not exactly divisible on word boundaries for best vectorization.

burntsushi · on Aug 24, 2016

Some other details on GNU grep's performance:

* The Boyer-Moore algorithm it uses has a skip loop, which is used to quickly search for the last byte in the needle before trying to do more checking. Specifically, the skip loop is implemented with memchr, which can be found in your libc and probably compiles down to SIMD instructions. (The trick here isn't actually skipping bytes, but processing many bytes in a single loop iteration.) This optimization works well in the common case, but can slow things down a bit if your skip byte is really common in the haystack.

* When a pattern consists of an alternation of literals, like, `abc|mno|xyz`, then it uses an algorithm based on Commentz-Walter, which you can think of as a hybrid between Aho-Corasick and Boyer-Moore. That is, you get the byte skipping of Boyer-Moore even though you're searching for multiple patterns. The performance of Commentz-Walter degrades as the number of patterns increases because there's fewer opportunities for meaningful skips. Nevertheless, small/short alternations are probably the common case with GNU grep. (The state of the art has actually improved for this specific case and GNU grep could probably benefit. Specifically, the Hyperscan folks over at Intel have some really cool SIMD algorithms for matching multiple patterns. I implemented one here (the comments include a description with examples): https://github.com/rust-lang-nursery/regex/blob/master/src/s...)

* The actual regex engine is a lazy DFA (similar to RE2) in that its states are computed on-demand as needed. That is, some parts of the DFA that aren't used are never computed at all. If the DFA gets too big in memory, the existing states are dropped and re-computed as needed. In this way, GNU grep gets the performance of a DFA while preserving linear time matching. (At most one new DFA state is computed for each byte of input.) The inner DFA loop is unrolled.

* To re-emphazie a point from the post: avoiding line-by-line matching is critical. This makes it much easier to extract literals from a pattern. For example, if your pattern is `\w+foobar\d+`, then GNU grep can still search for `foobar` because it can make an assumption that its haystack is generally a very small slice of the entire search space (i.e., a single line). This type of optimization is much harder to pull off in a general purpose regex engine. A general purpose regex engine might limit itself to looking for prefix or suffix literals only.

* GNU grep really doesn't do all that well for multi-byte encodings. e.g., The time difference between `LC_ALL=C egrep -c '\w{5}'` and `LC_ALL=en_US.UTF-8 egrep -c '\w{5}'` on a large file is quite dramatic. (~3 seconds vs ~52 seconds for a ~2GB mostly-ASCII file that is warm in my page cache.) There's really no need for this slow down if you can bake UTF-8 decoding into your DFA (RE2 does this). However, you then lose the ability to `grep` over other character encodings without transcoding them to UTF-8 first. (In fact, GNU grep may be doing this transcoding step today anyway, so maybe baking UTF-8 into your DFA would be a strictly better solution since it would be much faster on a much larger number of inputs, but probably not slower than any inputs. However, I'm not terribly familiar with this part of GNU grep, so I could have this wrong.)

philh · on Aug 24, 2016

> GNU grep really doesn't do all that well for multi-byte encodings.

I might be wrong, but I have a recollection that this was massively improved somewhat recently, maybe a year or so back. Are you using a recent-ish version?

This might also be to do with the needle - I think I was testing with a fixed ASCII string, which should work the same in either the UTF-8 or C locales (assuming a UTF-8 file). But \w I guess should behave differently depending on locale, e.g. I'd expect it to match ß in UTF-8 but not C.

burntsushi · on Aug 24, 2016

Yes, I'm on 2.25, which is the latest I believe.

> I might be wrong, but I have a recollection that this was massively improved somewhat recently, maybe a year or so back.

I seem to recall hearing that too. A brief skim of the commit log shows multiple possible improvements, but I don't have time to drill down further into that.

It could just be that it used to be worse. :-)

(If you look in the core DFA loop, which is in the `dfaexec_main` function inside `dfa.c`, then you can see that there is a completely different path through the DFA if it needs to use multibyte encodings.)

> But \w I guess should behave differently depending on locale, e.g. I'd expect it to match ß in UTF-8 but not C.

Exactly. But this doesn't have to have a measurable performance impact, assuming the number of matches doesn't radically change. (Well, this is a bit of a lie, because a Unicode aware \w is much bigger than an ASCII-only \w, which implies the DFA needs more memory, which could lead to a more general slowdown if the state cache needs to get emptied more frequently.)

delta1 · on Aug 24, 2016

This is a greatly insightful post, thank you for sharing!

kuharich · on Aug 24, 2016

Previous discussions: http://news.ycombinator.com/item?id=1626305 http://news.ycombinator.com/item?id=2860759 https://news.ycombinator.com/item?id=6813937 https://news.ycombinator.com/item?id=9153203 http://news.ycombinator.com/item?id=2393587 https://news.ycombinator.com/item?id=9446382

asadjb · on Aug 24, 2016

Here's a link to the paper the article talks about ("Fast String Searching", by Andrew Hume and Daniel Sunday):

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.13....

nachtigall · on Aug 24, 2016

What about concurrency? I have 4 CPU cores.

esac · on Aug 24, 2016

I'm pretty sure the bottleneck is disk I/O and not the CPU

ggreer · on Aug 24, 2016

When searching for a literal string, the bottleneck tends to be memory bandwidth. When doing regex searches, the bottleneck is usually CPU. If caches are cold, then disk I/O is the limiting factor. Even in that case, technologies like NCQ allow some degree of concurrency.

If you have ag[1], you can play around with the --workers option to see how various numbers of threads change performance. (The default is for ag to use #CPUs-1 workers.)

1. https://github.com/ggreer/the_silver_searcher

exDM69 · on Aug 24, 2016

Even when disk caches are involved (ie. data in RAM, not cache), a typical grep application (short "needle" to search for) should saturate the memory bandwidth before CPU cores.

Running a few threads/processes in parallel could improve throughput with latency hiding, but adding more shouldn't give any benefit.

josuah · on Aug 24, 2016

I believed busybox's grep to be faster as it was very small, with GNU version slower due to the many features added over the time, but it is quite not true! On a slow device as some evoked said, the difference is highly noticeable.

mcguire · on Aug 24, 2016

That's actually an important point. It's easy to misinterpret

"The key to making programs fast is to make them do practically nothing."

Busybox's grep is small and simple. GNU grep is big and complex, which is actually why it's faster. It does a lot of work to avoid doing a lot of work.

josuah · on Aug 28, 2016

Oh, so 'do practically nothing' rather than 'be practically nothing'. I understand now.

skywhopper · on Aug 24, 2016

Always worth a re-read because it's short, interesting, and teaches a good lesson

underyx · on Aug 24, 2016

Wait, I always tried to make my pattern as short as possible and I thought it would speed up the searches. So I guess this means I'm actually better off searching for the longest possible match then?

burntsushi · on Aug 24, 2016

Pretty much, yes. Longer strings should also match fewer times than short strings, which should also speed things up (because reporting a match has its own overhead associated with it, like printing text to the terminal).

SixSigma · on Aug 24, 2016

Try it in UTF8 mode and then you'll see a difference

dubmax123 · on Aug 24, 2016

Wisdom: "The key to making programs fast is to make them do practically nothing. ;-)"

SunboX · on Aug 24, 2016

Another rule to make programs fast: break early, break often.

gumby · on Aug 24, 2016

Great bit of wisdom buried at the very end:

> The key to making programs fast is to make them do practically nothing. ;-)