Regarding your first point: One possible implementation of tries uses an array with length "size of alphabet" to store the children of a node. So at each node it's just an offset into an array, not multiple character comparisons.
Of course your other points are valid. Thank you for your answer. So I gather the differences are rather subtle. I was wondering if I was missing something bigger.
Even assuming just a-z, that array will be mostly empty at lower levels, and most nodes are at lower levels. That’s not good for avoiding cache misses.
There are tricks to counteract that (you don’t store pointers to lower levels, for example, but offsets, because at typical sizes of a trie you can store them in way fewer bits), and I haven’t timed any implementation for decades, but it would surprise me if such an approach would improve performance on lower levels.
You could use it at top level (basically replacing the trie with separate tries for each starting letter)
Also, the ‘search for a character in an array’ code can be sped up by using vector instructions nowadays (can’t find a link for the exact code, but http://0x80.pl/articles/simd-strfind.html may give ideas). Whether want to do that because of power usage concerns is a completely different can of worms.
Of course your other points are valid. Thank you for your answer. So I gather the differences are rather subtle. I was wondering if I was missing something bigger.