Hacker News new | past | comments | ask | show | jobs | submit login
Fast wildcard searches with trigrams (mattwarren.org)
20 points by matthewwarren on Aug 20, 2015 | hide | past | favorite | 8 comments



I'm fairly sure this is the same technique employed by pg_tgrm. There's a new version (1.2) floating around the mailing list that's BLAZZZING fast!



thanks for that link, there's some interesting stuff in there.

I also tried using a trie, but as the linked post states, they are only good for starts-with or ends-with searches. They can't be used for contains searches, i.e. "java


I think this is the same technique as Russ Cox used to search Google Code: https://swtch.com/~rsc/regexp/regexp4.html


This is mentioned in the article. (and links to rsc)


This is a cool explanation of the technique, but maybe the author can share why they built their own tag database vs. using elasticsearch or postgres, both of which natively support trigram indexes (and in ES's case arbitrary ngrams).


It was "just for fun". I read about the Stack Overflow tag engine and I was intrigued what it would take to build it. Plus they make a dataset available, so I had ~8,000,000 questions to try it out on.

It's been a good chance to learn some optimisation techniques and new data structures, such as trigrams and bit map indexes (coming up in the next post)

Also AFAIK SO still use their own custom solution for tag searches. They've moved to Elasticsearch for regular searching, but not tag searches. So their must be some benefit to their custom solution.


he's explaining how tag database indexing and searching works - "use postgres" is not very illustrative.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: