Yes, it does. Capture groups aren't special here. The issue is a semantic quibble. Namely, when folks say, "a regex engine that guarantees matching in linear time," what they actually mean is, "a regex engine that guarantees matching in linear time with respect to the input searched where the regex itself is treated as a constant." If you don't treat the regex as a constant, then the time complexity can vary quite a bit depending on the implementation strategy.
For example, if you do a Thompson NFA simulation (or, more practically, a Pike VM), then the time complexity is going to be O(mn), where m ~ len(regex) and n ~ len(input), regardless of capturing groups.
As another example, if you compile the regex to a DFA before matching, then the time complexity is going to be O(n) since every byte of input results in executing a small constant number of instructions, regardless of the size of the regex. However, DFAs typically don't handle capturing groups (although they certainly can handle grouping), with the notable exception of Laurikari's Tagged DFAs, but I don't know off-hand if the time complexity of O(n) usually associated with a DFA carries over to Tagged DFAs. Of course, the principal downside of building a DFA is that it can use exponential (in the size of the regex) memory. This is why GNU grep, rust/regex and RE2 use a hybrid approach ("lazy DFA"), which avoids O(2^n) space, but falls back to O(mn) matching when the DFA would otherwise exceed some memory budget during matching.
> when folks say, "a regex engine that guarantees matching in linear time," what they actually mean is, "a regex engine that guarantees matching in linear time with respect to the input searched where the regex itself is treated as a constant."
Well the Rust docs say "all searches execute in linear time with respect to the size of the regular expression and search text." Their engine compiles to a DFA, not an NFA or PikeVM; I suppose this is the basis for their claim.
> As another example, if you compile the regex to a DFA before matching, then the time complexity is going to be O(n) since every byte of input results in executing a small constant number of instructions, regardless of the size of the regex. However, DFAs typically don't handle capturing groups
Now you have arrived at my question! Rust compiles to a DFA that supports capture groups. My question is whether capture groups ruin the linearity of the DFA matching.
Thanks for the Laurikari's Tagged DFAs reference, I hadn't heard of that. I'll check it out!
> Well the Rust docs say "all searches execute in linear time with respect to the size of the regular expression and search text."
Yeah, that's ambiguous phrasing on my part. I meant that it was linear time with respect to both the size of the regex and the search text.
> Their engine compiles to a DFA, not an NFA or PikeVM; I suppose this is the basis for their claim.
No, it doesn't. rust/regex uses some combination of the Pike VM, (bounded) backtracking and a lazy DFA. It will compile a DFA ahead of time in some cases where Aho-Corasick can be used.
> Rust compiles to a DFA that supports capture groups.
No, it uses a lazy DFA to answer "where does it match," but it still must use either the Pike VM or the bounded backtracker to resolve capture groups.
> My question is whether capture groups ruin the linearity of the DFA matching.
Yeah I think I would probably look at Tagged DFAs to answer this. You'll want to check out recent papers that cite Laurikari's work, since I think there have been some developments!
For example, if you do a Thompson NFA simulation (or, more practically, a Pike VM), then the time complexity is going to be O(mn), where m ~ len(regex) and n ~ len(input), regardless of capturing groups.
As another example, if you compile the regex to a DFA before matching, then the time complexity is going to be O(n) since every byte of input results in executing a small constant number of instructions, regardless of the size of the regex. However, DFAs typically don't handle capturing groups (although they certainly can handle grouping), with the notable exception of Laurikari's Tagged DFAs, but I don't know off-hand if the time complexity of O(n) usually associated with a DFA carries over to Tagged DFAs. Of course, the principal downside of building a DFA is that it can use exponential (in the size of the regex) memory. This is why GNU grep, rust/regex and RE2 use a hybrid approach ("lazy DFA"), which avoids O(2^n) space, but falls back to O(mn) matching when the DFA would otherwise exceed some memory budget during matching.