thanks for that link, there's some interesting stuff in there.
I also tried using a trie, but as the linked post states, they are only good for starts-with or ends-with searches. They can't be used for contains searches, i.e. "java
This is a cool explanation of the technique, but maybe the author can share why they built their own tag database vs. using elasticsearch or postgres, both of which natively support trigram indexes (and in ES's case arbitrary ngrams).
It was "just for fun". I read about the Stack Overflow tag engine and I was intrigued what it would take to build it. Plus they make a dataset available, so I had ~8,000,000 questions to try it out on.
It's been a good chance to learn some optimisation techniques and new data structures, such as trigrams and bit map indexes (coming up in the next post)
Also AFAIK SO still use their own custom solution for tag searches. They've moved to Elasticsearch for regular searching, but not tag searches. So their must be some benefit to their custom solution.