I'm not familiar with the AoC problem. You might be able to. But RegexSet doesn't give you match offsets.
You can drop down to regex-automata, which does let you do multi-regex search and it will tell you which patterns match[1]. The docs have an example of a simple lexer[2]. But... that will only give you non-overlapping matches.
You can drop down to an even lower level of abstraction and get multi-pattern overlapping matches[3], but it's awkward. The comment there explains that I had initially tried to provide a higher level API for it, but was unsure of what the semantics should be. Getting the starting position in particular is a bit of a wrinkle.
It's kind of awkward with that one, because you still have to check the individual patterns; it doesn't give you a multi-match on each pattern.
With the Aho-Corasick implementation you can just map the string -> {ordered list of matches} -> numbers associated with the match, and then you've got a little vec of digits you can grab the first and last entries of. Ended up being just a few lines of code, together with a hard-coded list of ["0", "one", "1", "two", ...] and the numbers they mapped to [0, 1, 1, 2, 2, ...].
> With the Aho-Corasick implementation you can just map the string -> {ordered list of matches} -> numbers associated with the match, and then you've got a little vec of digits you can grab the first and last entries of.
My solution was to shove it through `Itertools::minmax_by_key(|m| m.start()).into_option()`, which returns the lowest and highest matches (or a duplicate of a single match). Then to map to digits I actually ordered the patterns differently: I went 1, 2, 3, ..., one, two, three, ...
That way:
- for part 1 I could slice out the first 9 elements and it works uniformly
- mapping a "digit" to an actual digit is taking the index (match.as_pattern().as_usize()) modulo 9 to shift the textual versions to the numerical, then add one.
0/zero is not a valid digit so you can just ignore it, although you could always include it, use mod 10, and not increment the result, so same diff.
Ah, that's a nice observation about zero, that would have saved me some typing. :-)
I did the same thing with my actual solution in terms of order (but I went 0, 1, 2, 3), but mostly so I could truncate the matching array to solve part 1. Notably, that was a ret-con of my actual solution to part 1, which I originally did by just mapping the characters through .is_ascii_digit(), but I wanted to consolidate the code a little. I ended up with:
> match(es) multiple, possibly overlapping, regexes in a single search.