Hacker News new | past | comments | ask | show | jobs | submit login

Thanks for the nice and clear summary!

At what point does it make sense to switch to using vector search solutions (compared to keyword searches)? Obviously Google et al need it, but for regular apps, maybe academic repositories and so on, is there a threshold in which we can start the discussion to switch?

Or phrased another way, when do we know we should be looking at vector search solutions to enhance the search in our application?




When your users start complaining that your search sucks. :)

Either because your catalog has grown to the point where a traditional keyword search makes it harder to find relevant items, or because users are increasingly expecting their apps to just "know what they mean" (like Google, Spotify, Amazon, and Netflix do).


I run a super niche academic database - there is often that expectation that "search should know what I mean", yet there isn't enough data (in my opinion) to make any semantic search meaningful. There are about 23k data objects that can interlink, 50% of those belong to one specific data type, the rest are split.

So we've stuck with simple keyword search with filters to drill into specific categories within results, all pretty vanilla on a relational database structure.

Just wondering if there is a "volume" heuristic to this - I'd like to explore this more but realistically sometimes the academic user-base has big dreams with severe practical limitations.


You can train a good sentence transformer on ~10K sentences using TSDAE (an unsupervised training approach), covered in Chapter 7 here: https://www.pinecone.io/learn/unsupervised-training-sentence...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: