Thanks for the nice and clear summary! At what point does it make sense to switc...

gk1 · on Dec 15, 2021

When your users start complaining that your search sucks. :)

Either because your catalog has grown to the point where a traditional keyword search makes it harder to find relevant items, or because users are increasingly expecting their apps to just "know what they mean" (like Google, Spotify, Amazon, and Netflix do).

forgingahead · on Dec 15, 2021

I run a super niche academic database - there is often that expectation that "search should know what I mean", yet there isn't enough data (in my opinion) to make any semantic search meaningful. There are about 23k data objects that can interlink, 50% of those belong to one specific data type, the rest are split.

So we've stuck with simple keyword search with filters to drill into specific categories within results, all pretty vanilla on a relational database structure.

Just wondering if there is a "volume" heuristic to this - I'd like to explore this more but realistically sometimes the academic user-base has big dreams with severe practical limitations.

jamesbriggs · on Dec 15, 2021

You can train a good sentence transformer on ~10K sentences using TSDAE (an unsupervised training approach), covered in Chapter 7 here: https://www.pinecone.io/learn/unsupervised-training-sentence...