Thank you for ripgrep! I will mention it in that thread too but I'm AFK so before I forget...
I'm imagining a "drill down" TUI with rg and fzf. fzf can be good for both filenames and other filter-downs. Thinking of breadcrumbs and easily stepping forward or backward, ability to easily bookmark/"pin" parts of search paths as presets for easy reuse later, etc.
EDIT: I recognize this would be outside of the scope of rg itself, I'm voicing it in case it sparks ideas about the functionality you're thinking of adding. I'll think more about it and see if I can explain better
Code search was “too good to be true, gotta pinch myself” awesome. I miss it to this day.
Here is how I used it - I’d type in some code I was working on and the search result would show similar code and how it was used. Great for debugging and thinking by looking at similar solutions. Sigh.
Google engineers have superfast high end desktops and laptops, and 10 gbit internet connections. They don't tend to optimise their internal tools for low spec machines or internet connections.
Weird, I was just looking into google code search this weekend so I could use something like it on my work computer. It's a little surprising that big co git storage companies don't have a proper code search tool as part of their package. I use Bitbucket right now but the search is built over Elasticsearch and special characters aren't handled so regular expressions won't work.
A couple open source projects that I've seen are Hound and Zoekt. Hound actually uses this code search backend with a nice frontend in React. Zoekt is what I was going to use since it scales really well, is faster, and has good search operators for filtering by repo name, language, etc. Google was using Zoekt until recently for code search across all their open source repos.[0]
Interesting, because the search function that Cisco's intranet provides for documents and such is perhaps the single most useless piece of technology I've ever encountered. You could search something like "401k plan" and you'd get marketing materials written in Japanese. Utter trash.
Piper was actually a big source of frustration for me. Yeah it's dead simple, but once you have a CL chain, you're entering a world of pain. I've switched to Fig a while ago and haven't looked back. Beyond a tiny fix I'll start editing from CS or a throwaway citc client, it's just simpler to use fig. I've been able to juggle 4-5 CL chains easily and it makes my workflow much easier. Also splitting CLs before review is much simpler with Fig.
In my opinion, what makes them really great is the tight integration that they have. For example, since the whole company uses one build system and one single repository, you can build a truly awesome IDE that knows about every library in the company and can autocomplete for it. Same for code search, where cross references are accurate and work cross languages (for example a class generated from Protobuf).
Outside Google, the percentage of my coding time I spend hunting some dependencies source tree for the relevant header files or documentation or "Where on earth is this constant defined" is huge.
With codesearch, answering those kind of questions is near instant.
So CodeSearch, Critique, Borg, Sherlog, Cider come to mind as top notch tools that are not available outside. As far as libraries go, the C++ Fibers thing is incredible and I don't think it's open.
Blaze is amazing (albeit a bit slow) but Bazel should be more or less the same, haven't used it. Dremel, Spanner, Tensorflow, Proto, grpc are all available outside. Abseil (https://abseil.io/) is a great library available to everyone.
I’ve always wondered why regular expressions and full text indexes are the best thing we expect out of a code search engine.
I mean, we’re talking about code here. Text meant to be interpreted and understood by a compiler. Why can’t we do better?
Why can’t I say “show me everyone that’s calling this function”, like an IDE lets me do? Or “show me functions that accept <type> as one of their arguments and return <type>”, in a way that integrates with the real grammar/AST of the language(s) in question, without resorting to clunky regular expressions?
I should be able to write structured queries against a codebase, with regexes being just one part of that query language.
Only some languages can readily support such features.
For example C has a preprocessor and linking step driven by a build system. And C has a bunch of different build systems available, some of which are procedural rather than declarative.
Maybe you'll need to support package management - if a function signature calls for a CopyOnWriteArrayList do you need to know what the subclasses and superclasses of that type are? Do you need to resolve all the dependencies to be able to do that?
If you're thinking "No problem, everyone compiles their programs in CI anyway" - are you happy to skip indexing unused code and uncompileable code?
And of course you'll be chasing after language and build tool changes - not only to one language, but every language.
On the other hand, a nice simple grep? Sounds much simpler to me.
He's not that old, but he's not that young, either. He worked on Plan 9 for like a decade before this happened. A few years before he put out this blog post, he finished his PhD thesis. While a bit silly to hire someone with that much experience as an intern, it is how most PhDs are treated.
> To minimize I/O and take advantage of operating system caching, csearch uses mmap to map the index into memory and in doing so read directly from the operating system's file cache. This makes csearch run quickly on repeated runs without using a server process.
Does anyone know some resources where I can read more about this technique? (how-to, pros/cons, caveats, etc.) I'm interested in figuring out the best way to have a commandline tool persist state that it can quickly access across multiple runs, but so far a background server process is the only technique I'm familiar with.
But this excerpt says that this technique obviates the need for a server process? Are they saving the contents of memory into files using mmap, and then using this state on every run?