Minor note: you only need a vector database if you have so many possible inputs ...

chandureddyvari · on Sept 14, 2023

Yeah what you mentioned might be true. Currently our understanding on how LLMs really work behind the screens is limited. For example, there was a recent research[1] where LLM's accuracy is better if the context is added at the beginning when compared to the end of the prompt. So it's mostly by trial & error to figure out what works out best for you. You can use FAISS or similar to have the embeddings in-memory instead of a full fledged vector DB. But pg vector is convenient plugin if you already have postgres instance running

[1]- https://towardsdatascience.com/in-context-learning-approache...

bayesian_limit · on Sept 19, 2023

Zilliz just published an article comparing QPS (queries per second) with pg vector vs. Milvus. The results are clear - Milvus, a database designed ground-up for handling vector indexes, outperformed in terms of speed and latency. Dive into the details here. https://zilliz.com/blog/getting-started-pgvector-guide-devel...

Full disclosure, I just joined Zilliz this week as a Dev Advocate.

halflings · on Sept 14, 2023

What I mentioned doesn't depend on how LLMs work, the end result is the same (retrieving useful inputs to pass to your LLM). Just meant that a lot of people can just do this in-memory or in ad-hoc ways if they're not too latency constrained.

azmodeus · on Sept 14, 2023

I think unless you need a vector db definitely don't use it.

A vector storage could help in reduce the time it takes to retrieve the most similar hit. I used faiss as a local vector store quite a bit to retrieve vectors fast. Though I had 1.5 million vectors to work through.

chandureddyvari · on Sept 14, 2023

Interesting. I thought anything >1million would need a vector db to scale on production. What was your machine config for running faiss? Also did you plan for redundancy or was it just faiss as a service VM?

Tostino · on Sept 14, 2023

People seem to underestimate the scale you can get to on a single machine, and overestimate how easy it will be to go up from there.

An in memory index is about as good as it gets for a single node performance, and fitting that many vectors into memory on a single machine is easy.