I wrote the trie implementation referenced in the benchmarks. I didn't do it for speed, I did it because I couldn't add the router functionality I needed using the existing underlying implementation.
Cool, nice work! I didn't actually look at the benchmark and was just making (perhaps poor) assumptions. I'd imagine it does come with a nice little performance boost though?