Ripgrep has always been interesting to me as I don't ever find myself bothered by the speeds of GNU grep, even when working with large files. Additionally, grep is a standard utility included on most Unix-like OS-es so it is not super risky to write a script that relies on grep -- in contrast to writing a script that relies on a not-usually-installed-by-default tool like ripgrep. For me, I just don't have issues with grep!
I'd love to hear people's experiences on how grep wasn't adequate and why they use ripgrep instead.
(This is not a criticism of Ripgrep: I'm glad it exists and that other people find it useful.)
This, and also it ignores irrelevant files. It has sane defaults but you can tweak this with a .rgignore file, which is like .gitignore but for rg. By the way, it will use .gitignore files in a git directory.
That means that by default, it will take a lot less time and won't ruin your terminal when lines of some generated files (especially minified ones that are all on one line) match your search.
> By the way, it will use .gitignore files in a git directory.
If I'm in a repo, I'm using `git grep`.
That makes `rg` a mostly redundant tool for me since it's optimized for searching source code. I can't really use it as a general purpose replacement for grep since if it doesn't find anything I'm left wondering whether what I'm searching is not really there or whether `rg` just didn't bother to check. Even with `--no-ignore --all`, I'm still not sure whether it searches everything. It's one of those tools that I find is too clever for my own good.
So when `git grep` doesn't cover my use case, my fall back is `find | grep` which contains no magic and I know exactly what it's searching.
Thank you. This inspired me to read the ripgrep man page. I think I must have been confusing ripgrep's behavior with the earlier ack (or maybe ag) tool which only searched known extensions by default. I see now that ripgrep only does that if explicitly given a --type option.
This. Also you can do stuff like «rg —python myvariable» to search python files only. Neat for multi-language directory trees. (Works with many other languages.)
> It has sane defaults but you can tweak this with a .rgignore file, which is like .gitignore but for rg. By the way, it will use .gitignore files in a git directory.
Fwiw there’s also a « .ignore » semi-standard which works with several tools, and not just greps e.g. fd also respects it by default.
I use ripgrep all the time but I sometimes don't trust the results and I have been bitten too many times by heuristics it automatically uses to detect binary files and skip searching into those. These days I run ripgrep with alias `rg --color always --no-mmap -a` but still I think its binary file detection is wonky. I might be missing some relevant option but out of box - IMO grep is slower but always does what is told (and hence more reliable).
Can you give a specific example of where the binary file detection is wonky? It should be basically the same as what GNU grep does. It just looks for a NUL byte. If it exists, it's classified as binary data. Otherwise, text.
GNU grep also does binary detection by default. You have to opt into -a/--text there too. So maybe GNU grep doesn't always do what it's told either. :-)
Ah, that makes sense. I'm glad you brought that up because I didn't even notice that I'm just used to adding -R to my grep commands when I need recursive searching.
I can totally see how that would be a small, but impactful difference.
Whenever you normally use some programs with other options than their defaults, it is simpler to define aliases for those programs.
There are many common programs that I never use with their standard default options (which are very bad, IMO), e.g. cp, mv, ln, rm, rsync, date and many others, so I always define aliases for them, which include those options that I want to use by default.
So for grep, the recursive search should be included in the grep alias. There is no need for a new program in order to have this feature.
> Whenever you normally use some programs with other options than their defaults, it is simpler to define aliases for those programs.
I don't buy into this. These aliases tend to come at the cost, or at least the risk, that your workflow breaks when you are at another computer or working on a shell on some server that doesn't have this alias. That's why I like additional aliases, like l for ls, but with your favorite options. But I dislike aliases that change default behavior - and often in an intransparent way.
Unicode support. It might have been the Windows ports of grep that were the problem, but ripgrep shines with Unicode files. And it handles a mix of Unicode and ASCII files without issue.
And I totally agree that having grep installed everywhere and it's pretty fast enough. But I had a few ripgrep searches that were genuinely eyeblink fast. Like my finger hadn't fully lifted off the enter key and it was done. On 10K+ plus files, about 1 GB, with 1.5M+ LOC. And the default folder recursion and .gitignore handling is a plus.
On the filesizes I was working with (log files) both silver surfer and GNU grep would lock up and crash. Ripgrep handled the same thing in _seconds_. The difference in speed is staggering you may not think the speed bothers you, but I can't go back after installing ripgrep. Its the difference between my mind wandering waiting for a search to complete, versus instantly seeing the results and not losing a train of thought.
Have you worked in a monorepo? There can be 100kloc or more, easily, as well as tens or hundreds of gigabytes for build artifacts/ compilation artifacts, etc that you'll want to skip over.
For scripts I'll still use grep sometimes for the portability reason, naturally.
It doesn't even have to be a monorepo to see the speed difference. In Emacs I frequently invoke a thing where it searches my codebase as I type. With ripgrep, the results update almost instantaneously. ag, the silver searcher, is the second fastest thing I've used, but there would be a noticeable lag in updating the results as I typed, even for smaller repos.
ripgrep absolutely tears through some large monoorepos we have at work, far faster than GNU grep.
I imagine the performance difference is even more startling for anyone on a Mac who hasn't replaced BSD grep with GNU grep (install the gnu tools from homebrew and alias "grep" to "ggrep", the performance difference is huge).
I have not felt the need to use ripgrep either. If there was some set of directories and files that I had to search recursively and grep was not fast enough, then I would find a way to reduce the size/number of directories and files that I am searching. That is not always an easy problem to solve, but I believe it is the problem most worthy of solving. To me, size matters. Small has its advantages.
I use computers with resource constraints.^1 I have grep in multicall/crunched binaries. To use ripgrep I would have to make include it as a separate binary. Is there a solution similar to crunchgen for making crunched Rust binaries.
I am not sure if the "ripgrep" name is a joke or the author is serious. Assuming the later, I am content to wait for the BSD and Linux projects I use to switch from C to Rust and from BSD/GNU grep to ripgrep, at which point I would imagine it will simply be called "grep". For portability.
1. This may be why I have less need for ripgrep. I try to keep things small. Keeping things small routinely has the deirable side effect of making things relatively fast.
If you only search small corpora, then ripgrep's speed benefits obviously don't matter. There's unlikely to be material differentiation among grep tools in those cases. So it's a priori not a concern for you.
ripgrep has other benefits, but they aren't quite as universally compelling as its speed benefits. For example, when searching large corpora, pretty much everyone is going to appreciate a search taking 1 second vs 10 seconds. But many fewer people are going to appreciate, say, automatic transcoding from UTF-16 in order to search data.
Other than that, ripgrep's "smart" filtering by default would be its main benefit. My first link above address that.
The rip in ripgrep is meant to mean fast (as in ripping through the files), not rest in peace FWIW. It’s unlikely ripgrep will ever replace Greg as it has no intention of implementing 100% compatibility.
Haha I actually like the interfaces of most POSIX compliant tools. But maybe I've not looked to critically at the interfaces. Do you have any examples that are particularly bad in your opinion?
I find I'm installing third party CLI tools anyway, for xsv and jq as the real irreplacable but not yet standard Unix tools. Once I'm having to do that, stuff like ripgrep, fzf or fdfind get added for having a nice UI, even if they wouldn't cross the essential threshold otherwise.
There are two ways to talk about "speed." On the one hand, we have the "speed" that is associated with the user experience. That is, given some problem the user cares about, how fast can you solve it? On the other hand, we have the "speed" that is associated with doing precisely the same task and measuring which tool does it faster.
ripgrep is generally faster at both of those things, but it's the former where it really shines:
$ git remote -v
origin git@github.com:nwjs/chromium.src (fetch)
origin git@github.com:nwjs/chromium.src (push)
$ git rev-parse HEAD
5d32cab40f738932eddc017980e2e409c5abef2c
$ time rg 'Xvfb and Openbox' | wc -l
1
real 0.289
user 1.526
sys 1.731
maxmem 87 MB
faults 0
$ time grep -r 'Xvfb and Openbox' ./ | wc -l
1
real 5.405
user 3.489
sys 1.890
maxmem 11 MB
faults 0
(I ran these commands multiple times each until the times stabilized. i.e., The directory tree is in cache.)
We're talking about an order of magnitude improvement here to get the same results. And not just in a "ripgrep took 10ms and grep took 100ms, but both are fast enough" sense. This is the difference between "near instant results" and "this is taking annoyingly long."
Now of course, from the perspective of the second idea of speed, this isn't an apple-to-apples comparison. GNU grep is actually searching a lot more data here. We can make ripgrep search the same amount of data as GNU grep quite easily:
$ time rg -uuu 'Xvfb and Openbox' | wc -l
1
real 2.538
user 2.570
sys 3.017
maxmem 72 MB
faults 0
So, still a big improvement over GNU grep, but it's not quite as jaw dropping.
The important bit here is that a lot of people care about the improvement at the UX level. You can, for example, get a fair bit of improvement with GNU grep with some extra flags:
$ time grep -r --exclude-dir='.git' 'Xvfb and Openbox' ./ | wc -l
1
real 1.630
user 0.781
sys 0.826
maxmem 11 MB
faults 0
And now you've got to shove that stuff into an alias or a wrapper script. Which... is fine. I did it for a very long time before I wrote ripgrep. I had a whole bunch of aliases and wrapper scripts, many of which were specific to certain types of projects. But once I built ripgrep, all of those aliases and wrapper scripts went away. Because ripgrep's heuristics for smart filtering by default subsumed all of them.
Finally, it's worth pointing out that ripgrep isn't intended to replace grep. It literally can't. It's not POSIX compatible. So if you're writing shell scripts and care more about portability, then 'grep' is a fine choice. Indeed, I still use 'grep' for precisely that purpose. See: https://github.com/BurntSushi/ripgrep/blob/master/FAQ.md#pos...
Yeah, it's actually an interesting case study in performance optimization - GNU grep puts all this effort into optimizing the performance characteristics of the system calls it uses based on deep kernel knowledge, but ripgrep is orders of magnitude faster for many users via the simple trick of "completely ignore a lot of files by default"
That's not really what's happening in the article. If you read through the single file benchmark, you'll see several clever algorithmic improvements (like rarest byte guessing, building a set of variants for Unicode-aware multiple pattern matching, etc...).
The author literally concedes that the .gitignore feature was not done for performance, and actually carries a significant overhead in large directory trees. For the sake of comparability, the study was controlled for the .gitignore overhead.
> simple trick of "completely ignore a lot of files by
The author of rg wrote a blog post about this. According to what I recall, he did performance comparisons on same limitations and scope. So it's not like in that benchmark, the difference would be due to an obvious fact as this.
This is very very very wrong. GNU grep is not doing any optimizations based on "deep kernel knowledge" that ripgrep doesn't do. I'm honestly not even sure what you're referring to. GNU grep uses standard 'read' syscalls. ripgrep does that too (but also uses memory maps in some cases). There is some buffer size tuning, but otherwise, nothing particularly interesting there.
ripgrep's speed might come from ignoring files in any given use case, and it might even be the biggest reason why a search completes faster. But in my linked blog post, I control for all of that. Yes, while ripgrep might be faster in some cases because of its "smart" filtering, it's also faster in cases where "smart" filtering isn't enabled.
grep still runs a regex processor though, which I think by default is a deterministic finite-state machine. I had the impression that even when only fixed strings are used one would still need to use -F to get to full speed. For fixed strings Boyer-Moore is the obvious choice.