Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Minor note: you only need a vector database if you have so many possible inputs that linear retrieval is too slow.

Arguably, for many use cases (e.g. searching through a document with ~200 passages), loading embeddings in memory and running a simple linear search would be fast enough.



Yeah what you mentioned might be true. Currently our understanding on how LLMs really work behind the screens is limited. For example, there was a recent research[1] where LLM's accuracy is better if the context is added at the beginning when compared to the end of the prompt. So it's mostly by trial & error to figure out what works out best for you. You can use FAISS or similar to have the embeddings in-memory instead of a full fledged vector DB. But pg vector is convenient plugin if you already have postgres instance running

[1]- https://towardsdatascience.com/in-context-learning-approache...


Zilliz just published an article comparing QPS (queries per second) with pg vector vs. Milvus. The results are clear - Milvus, a database designed ground-up for handling vector indexes, outperformed in terms of speed and latency. Dive into the details here. https://zilliz.com/blog/getting-started-pgvector-guide-devel...

Full disclosure, I just joined Zilliz this week as a Dev Advocate.


What I mentioned doesn't depend on how LLMs work, the end result is the same (retrieving useful inputs to pass to your LLM). Just meant that a lot of people can just do this in-memory or in ad-hoc ways if they're not too latency constrained.


I think unless you need a vector db definitely don't use it.

A vector storage could help in reduce the time it takes to retrieve the most similar hit. I used faiss as a local vector store quite a bit to retrieve vectors fast. Though I had 1.5 million vectors to work through.


Interesting. I thought anything >1million would need a vector db to scale on production. What was your machine config for running faiss? Also did you plan for redundancy or was it just faiss as a service VM?


People seem to underestimate the scale you can get to on a single machine, and overestimate how easy it will be to go up from there.

An in memory index is about as good as it gets for a single node performance, and fitting that many vectors into memory on a single machine is easy.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: