Hacker News new | past | comments | ask | show | jobs | submit | ChocoluvH's comments login

Always wondering pros/cons of Chroma and Qdrant. Can someone tell me?


Chroma doesn't seem to be a real DB, it's rather a wrapper around tools like hnswlib, DuckDB or Clickhouse. Qdrant is way more mature - it has its own HNSW implementation with some tweaks to incorporate filtering directly during the vector search phase, supports horizontal and vertical scaling, as well as provides its own managed cloud offering.

In general, Qdrant is a real DB, not a library and that's a huge difference.


What does Chroma lack? Their APIs seem pretty much the same to me.


I've tried both Chroma and Qdrant. I don't think Chroma lacks that much. Definitely newer, but is also a great product. I think cloud support coming Q3 2023


haha. That case you might actually wanna consider FAISS/Milvus instead of Redis.


We’ve looked into FAISS and Milvus. Milvus is possibly an excellent option for us in the future. What’s your experience with these so far?


Great to hear that you're considering Milvus. Feel free to reach out if you ever have any questions/comments/concerns.

Just took a look at your docs and product page as well. Keep up the great work!


I tried out milvus. Developer Experience is crap. Documentation lacks some major core concepts. I've been experimenting with it for hours. Eventually I turned my back and said: Why not use pg_vector and scale the fuck out of the cluster? That should bring.. equal performance, as the pg_vector implementation is written in c and the comparing algorithms wouldn't differ too much from milvus.


hnswlib? Best of the bunch imho


Cool webUI. Why is it not on Motif main site? https://motif.land/


There's no such thing as open internet


There might have been 20 years ago, but in the meantime it's been killed by politicians and big corps.


It's been killed by big tech, for once traditional big corps are kind of innocent.


Yes, there is: I2P.


Damn. Time to HODL?


Don't start with Milvus clustered version, not unless you have like 100million vectors.

Try Milvus standalone instead, much simpler. I also just found their python version (https://github.com/milvus-io/embd-milvus), which is quite neat.


Open source software nowadays are very easy to use.

If your guy couldn't get a single open source software straight, you had the wrong guy :(

I can only see managed service useful when I had 100X traffic and when strong SLA is required.


IMO vector databases should not mess with ElasticSearch.

The real focus should be to improve the recall of vector search. Pity that nobody is doing real AI research here. Money wasted in marketing and branding.


Totally agree. The thing is that ElasticSearch does not meet our requirements in vector searching.

I am currently running with Milvus + ElasticSearch, works perfect. The latest Milvus version is super fast and scalable (>50M vectors). Haven't tried Zilliz Cloud. Have to find out what the cost is.

I am old school. IMO ElasticSearch is only good for keyword search and these so called "vector databases" products are only good for vector search.


If ES doesn't work for you, I recommend Vespa. https://github.com/vespa-engine/vespa

Others have made other suggestions, but Vespa has two unique features. First it is battle tested at a large scale, second it supports combining the keyword and vector scores in several ways. The latter is something that other hybrid systems don't do very well in my experience.


Didn't even realise Milvus was so lacking. https://github.com/marqo-ai/marqo also has a hybrid approach. It's just a more complete/end-to-end platform than pinecone, so it really just depends on what you're building


I personally like Milvus very much.

My point is I only trust stuff that focuses their own business. Especially for small startups.


Could you please elaborate on how you utilize both of them together, and for which specific use case? I'm attempting to gain a better understanding of the hybrid approach.


Certainly!

The thing is to make ElasticSearch scores "comparable" to Milvus scores. Lots of ways to do this, but there's no single good solution. For example you could calculate BM25 score offline, or use TF-IDF score to do some kind of filtering. Again there's no single perfect answer. You'd have to do a lot of experiment according to your own use case and your own data to get the best results.

Also a lot of tuning needs to be done during all phases: 1) query pre-processing 2) query tokenizing 3) retrieval 4) ranking and reranking

I personally would not trust any universal "hybird-search" solutions. All toy demos.

It usually takes 5-10 good engineers to build a decent search engine/system for any real use case. It also requires a lot of turning, tricks, hand-written rules to make things work.


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: