Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why the spectacular loss to V8 in the Regex tests? Is Go's regex engine a naive implementation as of now?


Russ Cox, in response to a regex benchmark vs. Python/Ruby.

"You assume the benchmark is worth something.

First of all, Ruby and Python are using C implementations of the regexp search, so Go is being beat by C, not by Ruby.

Second, Go is using a different algorithm for regexp matching than the C implementations in those other languages. The algorithm Go uses guarantees to complete in time that is linear in the length of the input. The algorithm that Ruby/Python/etc are using can take time exponential in the length of the input, although on trivial cases it typically runs quite fast."[1]

[1] https://groups.google.com/d/msg/golang-nuts/6d1maef4jQ8/yg4N...


As a member of the masses of programmers that typically deal with truckloads of trivial problems, his reply is worth as much as the benchmark he criticizes. I've never used anything but a trivial regexp (for some values of trivial) in production code (one-shot problems don't count) and I don't give a damn what is being benchmarked here - I only care if my Python script runs fast.


Actually, I might agree with Go here. I agree that most of my regexes are the trivial kind that don't exhibit exponential growth, but that's only because I also control the input to all those regexes. But Go's intended niche appears to be for user-facing server-side programs, and in that environment if you ever run a naive exponential-time regex against user-supplied input you're setting yourself up for a potential DOS attack.


But Go's intended niche appears to be for user-facing server-side programs, and in that environment if you ever run a naive exponential-time regex against user-supplied input you're setting yourself up for a potential DOS attack.

Yes, but for larger recognizers, one would probably use something like Ragel anyway. A DSL that provides intersection, union, difference, concatenation, composition, etc. makes it far easier to construct elaborate automata from smaller expression than the 'compile a big regex string' approach.


Why not use the potentially exponential algorithm initially and timeout and fall back to the linear case if the sometimes-quick one has taken longer than the linear one would have? At most a 2x slowdown for complex cases, and substantial speedup for the simple ones.


It depends, if you have the opportunity to build a DFA once, why not do it? It's only a one-time cost.

I think the more serious problem is that what most people consider to be a 'regular expression' does not express a regular language, and hence cannot be expressed as a finite state automaton.



I think his answer is useful. It does explain what's going on. If the performance characteristics of PCRE suit your problem better it's trivial to write a cgo wrapper for it. There's even one on github, though I don't know if it works.


It works (and will be updated across the website once Go1.1 is released).

http://benchmarksgame.alioth.debian.org/u64q/benchmark.php?t...


>>a different algorithm for regexp matching<<

Here's a C++ program that uses RE2

http://benchmarksgame.alioth.debian.org/u64/program.php?test...


V8 JIT-compiles regular expressions.

EDIT: Since apparently someone doesn't believe me: http://blog.chromium.org/2009/02/irregexp-google-chromes-new... ("After optimization we generate native machine code which uses backtracking to try different alternatives.") See also the yarr JIT, which jsc and SpiderMonkey use for regular expressions: http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/y....


Looks like there's a reimplementation of RE2 in Go that might we worth benchmarking against. https://code.google.com/p/sre2/


So what approach is implemented in the Go distribution?


The same as re2 (and, in fact, written by the same authors as re2)


So why is the C++ regex-dna program that uses RE2 so much faster?

http://benchmarksgame.alioth.debian.org/u64/program.php?test...

And why did someone do "a reimplementation of RE2 in Go" ?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: