Google Code Search *does* do regexes against an impressively large set of docume...

tikhonj · on Sept 14, 2011

One thing you should note is that Google Code Search, as far as I know, supports regular expressions that are actually regular. This means you can't have an expression like /(ab..)\1/, for example.

In all, re2, the regular expression engine that Google Code uses, is a very interesting project; you should read about it on its google code page: http://code.google.com/p/re2/.

wumpus · on Sept 15, 2011

The issue is not so much how much cpu time the regex evaluation takes up, it's the I/O time of loading every byte of every page we've crawled.

That being said, re2 does look pretty cool... having a guarantee that nothing in an re can blow up is pretty nice, on top of the overall speed improvement.