Often the use case is search. Ex. You have a basic text search engine to find musicians on your site which does some string matching and basic tokenization and so on. But you want to be able to surface similar types of musicians on search too.
In that case you might store vectors representing a user based on some features youve selected, or a word embedding of their common genres/tags.
To actually search this thing, you need something to compare against. You could directly use the word embeddings of the search query. You could also do a search against your existing method, and then use the top results from that as a seed to search your vectors.
Since everything's a vector, you can also ask questions like "what musician is similar to Tom AND Sally" by looking for vectors near T+S. T-S could represent like Tom but not like Sally, etc.
So the answer to what do you store is, what will be your seed to search against?
Wow, that's interesting. So we can do vector arithmetic and the results make sense as a form of embedding/concept logic. Addition seems to work like "and" / set intersection. Subtraction works like set difference ("T\S", also literally written "T-S", i.e. "without"), which logically says "T, but not S", or in terms of predicate calculus, "T(x) & not S(x)".
Perhaps there is also some unitary vector operation which directly corresponds to negation (not Sally)? Perhaps multiplying the vector by -1? Or would (-1)S rather pick out "the opposite of Sally in conceptual space" instead of "not / anyone but, Sally"? And what about logical disjunction (union)? One could go further here, and ask whether there is an analog to logical quantifiers. Then there is of course the question whether there is anything in vector logic which would correspond to relations, binary predicates, like R(x, y), not just unitary ones, etc.
In that case you might store vectors representing a user based on some features youve selected, or a word embedding of their common genres/tags.
To actually search this thing, you need something to compare against. You could directly use the word embeddings of the search query. You could also do a search against your existing method, and then use the top results from that as a seed to search your vectors.
Since everything's a vector, you can also ask questions like "what musician is similar to Tom AND Sally" by looking for vectors near T+S. T-S could represent like Tom but not like Sally, etc.
So the answer to what do you store is, what will be your seed to search against?