ChocoluvH's comments

ChocoluvH · on May 22, 2023

Always wondering pros/cons of Chroma and Qdrant. Can someone tell me?

kacperlukawski · on May 22, 2023

Chroma doesn't seem to be a real DB, it's rather a wrapper around tools like hnswlib, DuckDB or Clickhouse. Qdrant is way more mature - it has its own HNSW implementation with some tweaks to incorporate filtering directly during the vector search phase, supports horizontal and vertical scaling, as well as provides its own managed cloud offering.

In general, Qdrant is a real DB, not a library and that's a huge difference.

ChocoluvH · on May 22, 2023

What does Chroma lack? Their APIs seem pretty much the same to me.

jeadie · on May 22, 2023

I've tried both Chroma and Qdrant. I don't think Chroma lacks that much. Definitely newer, but is also a great product. I think cloud support coming Q3 2023

ChocoluvH · on March 28, 2023

haha. That case you might actually wanna consider FAISS/Milvus instead of Redis.

jxodwyer1 · on March 28, 2023

We’ve looked into FAISS and Milvus. Milvus is possibly an excellent option for us in the future. What’s your experience with these so far?

fzliu · on March 28, 2023

Great to hear that you're considering Milvus. Feel free to reach out if you ever have any questions/comments/concerns.

Just took a look at your docs and product page as well. Keep up the great work!

AmazingTurtle · on March 29, 2023

I tried out milvus. Developer Experience is crap. Documentation lacks some major core concepts. I've been experimenting with it for hours. Eventually I turned my back and said: Why not use pg_vector and scale the fuck out of the cluster? That should bring.. equal performance, as the pg_vector implementation is written in c and the comparing algorithms wouldn't differ too much from milvus.

leobg · on March 28, 2023

hnswlib? Best of the bunch imho

ChocoluvH · on March 28, 2023

Cool webUI. Why is it not on Motif main site? https://motif.land/

ChocoluvH · on March 28, 2023

There's no such thing as open internet

cabalamat · on March 28, 2023

There might have been 20 years ago, but in the meantime it's been killed by politicians and big corps.

hef19898 · on March 28, 2023

It's been killed by big tech, for once traditional big corps are kind of innocent.

fsflover · on March 28, 2023

Yes, there is: I2P.

ChocoluvH · on March 28, 2023

Damn. Time to HODL?

ChocoluvH · on March 28, 2023

Don't start with Milvus clustered version, not unless you have like 100million vectors.

Try Milvus standalone instead, much simpler. I also just found their python version (https://github.com/milvus-io/embd-milvus), which is quite neat.

ChocoluvH · on March 27, 2023

Open source software nowadays are very easy to use.

If your guy couldn't get a single open source software straight, you had the wrong guy :(

I can only see managed service useful when I had 100X traffic and when strong SLA is required.

ChocoluvH · on March 27, 2023

IMO vector databases should not mess with ElasticSearch.

The real focus should be to improve the recall of vector search. Pity that nobody is doing real AI research here. Money wasted in marketing and branding.

ChocoluvH · on March 26, 2023

Totally agree. The thing is that ElasticSearch does not meet our requirements in vector searching.

I am currently running with Milvus + ElasticSearch, works perfect. The latest Milvus version is super fast and scalable (>50M vectors). Haven't tried Zilliz Cloud. Have to find out what the cost is.

I am old school. IMO ElasticSearch is only good for keyword search and these so called "vector databases" products are only good for vector search.

billythemaniam · on March 26, 2023

If ES doesn't work for you, I recommend Vespa. https://github.com/vespa-engine/vespa

Others have made other suggestions, but Vespa has two unique features. First it is battle tested at a large scale, second it supports combining the keyword and vector scores in several ways. The latter is something that other hybrid systems don't do very well in my experience.

jeadie · on March 26, 2023

Didn't even realise Milvus was so lacking. https://github.com/marqo-ai/marqo also has a hybrid approach. It's just a more complete/end-to-end platform than pinecone, so it really just depends on what you're building

ChocoluvH · on March 27, 2023

I personally like Milvus very much.

My point is I only trust stuff that focuses their own business. Especially for small startups.

hcentelles · on March 26, 2023

Could you please elaborate on how you utilize both of them together, and for which specific use case? I'm attempting to gain a better understanding of the hybrid approach.

ChocoluvH · on March 27, 2023

Certainly!

The thing is to make ElasticSearch scores "comparable" to Milvus scores. Lots of ways to do this, but there's no single good solution. For example you could calculate BM25 score offline, or use TF-IDF score to do some kind of filtering. Again there's no single perfect answer. You'd have to do a lot of experiment according to your own use case and your own data to get the best results.

Also a lot of tuning needs to be done during all phases: 1) query pre-processing 2) query tokenizing 3) retrieval 4) ranking and reranking

I personally would not trust any universal "hybird-search" solutions. All toy demos.

It usually takes 5-10 good engineers to build a decent search engine/system for any real use case. It also requires a lot of turning, tricks, hand-written rules to make things work.