I would assume you can also use RegexSet from the regex crate, as it > match(es)...

burntsushi · on Dec 1, 2023

I'm not familiar with the AoC problem. You might be able to. But RegexSet doesn't give you match offsets.

You can drop down to regex-automata, which does let you do multi-regex search and it will tell you which patterns match[1]. The docs have an example of a simple lexer[2]. But... that will only give you non-overlapping matches.

You can drop down to an even lower level of abstraction and get multi-pattern overlapping matches[3], but it's awkward. The comment there explains that I had initially tried to provide a higher level API for it, but was unsure of what the semantics should be. Getting the starting position in particular is a bit of a wrinkle.

[1]: https://docs.rs/regex-automata/latest/regex_automata/meta/in...

[2]: https://docs.rs/regex-automata/latest/regex_automata/meta/st...

[3]: https://github.com/rust-lang/regex/blob/837fd85e79fac2a4ea64...

dgacmu · on Dec 1, 2023

It's kind of awkward with that one, because you still have to check the individual patterns; it doesn't give you a multi-match on each pattern.

With the Aho-Corasick implementation you can just map the string -> {ordered list of matches} -> numbers associated with the match, and then you've got a little vec of digits you can grab the first and last entries of. Ended up being just a few lines of code, together with a hard-coded list of ["0", "one", "1", "two", ...] and the numbers they mapped to [0, 1, 1, 2, 2, ...].

masklinn · on Dec 1, 2023

> With the Aho-Corasick implementation you can just map the string -> {ordered list of matches} -> numbers associated with the match, and then you've got a little vec of digits you can grab the first and last entries of.

My solution was to shove it through `Itertools::minmax_by_key(|m| m.start()).into_option()`, which returns the lowest and highest matches (or a duplicate of a single match). Then to map to digits I actually ordered the patterns differently: I went 1, 2, 3, ..., one, two, three, ...

That way:

- for part 1 I could slice out the first 9 elements and it works uniformly

- mapping a "digit" to an actual digit is taking the index (match.as_pattern().as_usize()) modulo 9 to shift the textual versions to the numerical, then add one.

0/zero is not a valid digit so you can just ignore it, although you could always include it, use mod 10, and not increment the result, so same diff.

dgacmu · on Dec 1, 2023

Ah, that's a nice observation about zero, that would have saved me some typing. :-)

I did the same thing with my actual solution in terms of order (but I went 0, 1, 2, 3), but mostly so I could truncate the matching array to solve part 1. Notably, that was a ret-con of my actual solution to part 1, which I originally did by just mapping the characters through .is_ascii_digit(), but I wanted to consolidate the code a little. I ended up with:

https://github.com/dave-andersen/advent2023/blob/main/src/ma...