Hacker News new | past | comments | ask | show | jobs | submit login

Whoa. How does knowing the match ahead of time help the second pass?



Finding the match can use the lazy DFA, which is roughly one or two orders of magnitude faster than the Pike VM, and similarly faster than bounded backtracking. Typically, the span of a match is much much smaller than the entire input you're searching, so it makes sense to find that span as quickly as possible. When it comes time to run the Pike VM (or the bounded backtracker), you can run the regex on just the span of the match, which will be much better than running it on the entire input to find the match if the input is large.

This strategy doesn't always make sense, and it can be better to just run the NFA right away if the input is small.

If you find this stuff interesting, you'll definitely want to check out Russ Cox's article series on regexes. In particular: https://swtch.com/~rsc/regexp/regexp3.html




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: