Regarding your first point: One possible implementation of tries uses an array w...

Someone · on Jan 11, 2019

Even assuming just a-z, that array will be mostly empty at lower levels, and most nodes are at lower levels. That’s not good for avoiding cache misses.

There are tricks to counteract that (you don’t store pointers to lower levels, for example, but offsets, because at typical sizes of a trie you can store them in way fewer bits), and I haven’t timed any implementation for decades, but it would surprise me if such an approach would improve performance on lower levels.

You could use it at top level (basically replacing the trie with separate tries for each starting letter)

Also, the ‘search for a character in an array’ code can be sped up by using vector instructions nowadays (can’t find a link for the exact code, but http://0x80.pl/articles/simd-strfind.html may give ideas). Whether want to do that because of power usage concerns is a completely different can of worms.