Qdrant, the Vector Search Database, raised $28M in a Series A round

francoismassot · 2024-01-23T13:19:20 1706015960

Congrats to Qdrant's team, $28M for a Series is really nice.

There are a lot of OSS vector search databases out there, we could probably list the main ones:

- Qdrant: https://github.com/qdrant/qdrant

- Weaviate: https://github.com/weaviate/weaviate

- Milvus: https://github.com/milvus-io/milvus

What else?

lmeyerov · 2024-01-23T14:31:04 1706020264

It's funny taking a numbers view. The most popularly used might not even be these, but vector indexes in existing popular OSS DBs and storage systems people are already using. Afaict earliest would be faiss on disk and vectors in opensearch & elasticsearch, and I'd be curious how say databricks, pgvector and other big ones are getting picked up now that they are out. Most of these supported fast & large-scale indexes even early on (ivfpq, ...) by wrapping faiss and friends. ~All OSS DBs we use now, esp managed, have or are getting vector indexes.

Another one most similar to qdrant we track internally is lancedb. They are clever by supporting an embedded architecture, so an architectural reason to prefer over most existing OSS DBs. In our survey 2 years ago, we predicted specialized vector DBs having regular OSS DBs be the elephant in the room, and missed embedded as a fundamentally different category: https://gradientflow.com/the-vector-database-index/ .

(Good luck to qdrant! I'm happy they waited before raising, hopefully this means they can operate more healthily than otherwise and easier to maintain the discipline to do that!)

jillesvangurp · 2024-01-23T15:31:52 1706023912

There are a few more. Pinecone comes to mind.

And then there are traditional databases and search products that are integrating vector search capabilities as well: Postgres, Elasticsearch, Opensearch, Solr.

They each have their limitations of course but the 28M round suggests a moat that I'm not seeing that clearly in terms of tech. What's so special about qdrant relative to their competition?

At least they are Apache licensed for now. So, that's nice. But that also means e.g. Apache Lucene could borrow some code from them to beef up their vector search capabilities. Which would benefit Elasticsearch, Opensearch, and Solr which all depend on Lucene.

Which raises the question what the point is of QDrant long term and why investors are betting on this as opposed to other things.

It seems to me that the main challenge with vector search is inference cost (at index and query time), not storing the vectors. A secondary concern is the vector comparisons at query time. A good way to cut down on that is to reduce the overall result set using traditional search or query mechanisms. In other words, you need

manishsharan · 2024-01-23T16:39:17 1706027957

I think there will be enough of market to justify a few more dedicated VectorDB vendors.

From the enterprise perspective, which of these vendors proved the best combination of security, availability, performance and pricing will matter. when we run benchmarks on our (self hosted) LLMs, we do not a clear idea of where we have bottlenecks and we end up assuming its the GPU/memory. And our pilot implementation will never go into production as the security model is nearly non existent in our implementations; the execs AND qa are getting the same RAG outputs. It is all very new to us and our teams. If a vendor can outperform its competition in our tests and show credible security model with segmentation of knowledge, that would be the choice.

jillesvangurp · 2024-01-24T09:29:53 1706088593

Depends, short term there is a lot of experimentation and innovation in this space obviously. But long term what matters (for investors) is defensible moats. Most or these products look a bit like one trick ponies to me with some features that are extremely likely to be borrowed by competitors, if they work.

Vector databases are not about hosting LLMs or AI models, they are about storing and comparing embeddings vectors. You generate those with an AI model. OpenAI provides a few of those but GPT 4 is not typically what you'd use for this.

Model training and inference is typically not what a vector database does. You need it but to populate a vector database with content. Qdrant is not an exception to this, it uses third party models and inference technology for this (all the usual suspects basically). I just looked at their documentation to confirm this (but do correct me if I'm wrong).

Additionally, it lists all the classic use cases for vector search as its use cases (image search, semantic search, recommendations, similarity search, etc.). I'm sure it's awesome. But in the end it stores and compares vectors using an open source (and possibly patented?) algorithm. Which means if their approach is particularly good and novel, it will get copied in no time by other open source vector search capable products (i.e. most serious databases and search engines at this point). If it hasn't been already. I don't see dedicated vector database products having any inherent advantage here. Rather the opposite since they lack a lot of features that you might also need.

timvisee · 2024-01-24T09:43:28 1706089408

With regards to competitors picking similar things up, we've this article explaining why a dedicated vector search service is a good idea: https://qdrant.tech/articles/dedicated-service/

internet101010 · 2024-01-23T17:24:53 1706030693

I couldn't agree more. I would add reproducibility to the list of important things, above everything else you mentioned. I looked into Vanna after seeing it on here because generating SQL code by only embedding the schema and business logic seems like a nice, quick middle ground that doesn't require embedding an entire database. However, 88% accuracy in generating the correct query isn't good enough for deployment at the organization-level. "Give me sales for the last quarter as of end of prior month" should return the same result for everyone, without exception.

lsaferite · 2024-01-23T15:56:03 1706025363

Is Pinecone OSS?

I ask because this was the statement from the PP

   There are a lot of OSS vector search databases out there, we could probably list the main ones
   ...
   What else?

utopcell · 2024-01-23T18:41:01 1706035261

Pinecone is not OSS.

alfalfasprout · 2024-01-23T19:08:50 1706036930

Sure, but frankly it's historically very hard to build a business around a specialized database, especially if you have competitors that are even 80% as good but free.

The cases where I've seen this work are when the DB offers something way ahead of what their competitors offer. For example, KDB+ was historically unrivaled when it came to ultra high performance time series storage and Aerospike is very hard to beat for extremely high performance multi-node K/V.

Otherwise there's little to stop a larger company from offering the OSS competitor to your DB as a service for a lower cost and invest eng resources to close the gap.

mritchie712 · 2024-01-23T13:56:48 1706018208

We use pgvector which if you're already using postgres should be in the running for your use case.

I also like https://github.com/lancedb/lancedb

epistasis · 2024-01-23T18:16:05 1706033765

It's fascinating to see the diversity of vector databases! I've chosen to prototype with two, ChromaDB, and LanceDB, based on the ease of using embeddings with them, and had not even heard of these others here. I'm also very excited to go through VectorHub's table of databases:

https://vdbs.superlinked.com

(discovered from sibling comment here: https://news.ycombinator.com/item?id=39103322)

rgbrgb · 2024-01-23T18:40:42 1706035242

I've been looking at this one to embed in desktop apps https://github.com/unum-cloud/usearch

bumberg · 2024-01-24T00:23:02 1706055782

Marqo.ai (https://github.com/marqo-ai/marqo) is doing some interesting stuff and is oss. We handle embedding generation as well as retrieval (full disclosure, I work for Marqo.ai)

shenli3514 · 2024-01-23T20:08:13 1706040493

Chroma looks good. https://github.com/chroma-core/chroma 10k+ stars, very easy to use, and can be used as an embedding database

tajd · 2024-01-23T13:57:59 1706018279

This website for comparing vector database solutions might be handy https://vdbs.superlinked.com/

daveed · 2024-01-23T18:18:37 1706033917

I think Activeloop(YC) is too: https://github.com/activeloopai/deeplake/

morgango · 2024-01-23T21:06:30 1706043990

Elasticsearch: https://www.elastic.co/platform

andre-z · 2024-01-23T13:28:28 1706016508

These are the major ones, correct.

lettergram · 2024-01-23T13:03:37 1706015017

Not to knock Qdrsnt, but generally the whole “vector search database” rush is insane.

I’ve been working with vectors for over a decade; particularly with embeddings used in AI. We’re talking projects from 100k to 100B+ records, used for AI applications

Postgres, particularly with pgvector and derivatives, can handle to millions of records very rapidly no problem. It’s very cheap, scales great, and is accurate.

I’m sure some of these open source solutions are improvements. That said, weigh vendor lock in, cost, risk and in the end it usually makes very little sense.

therealdrag0 · 2024-01-24T02:53:30 1706064810

What’s you use for 100B records?

lettergram · 2024-01-24T03:58:53 1706068733

Idk if I can talk about my exact project. But I worked alongside these folks:

https://www.capitalone.com/tech/machine-learning/learning-em...

Running an R&D group in the department.

tw1984 · 2024-01-23T13:22:34 1706016154

I don't think such business model is going to last. There is no reason for AI giants like OpenAI to stick with such external "vector databases". There is not much technical stuff there. Unless you want to argue that "vector searching" is just some labor work when compared to AI, in that case, sure.

manishsharan · 2024-01-23T13:26:36 1706016396

There are huge segments e.g. banking, insurance,legal, which are wary of using OpenAI and they would much rather host their own LLMs. I think these vector databases will find a ready market in this segment

tw1984 · 2024-01-23T14:13:12 1706019192

Tell me what makes you believe that those big techs are not going suit those "banking, insurance,legal" orgs by providing them their own LLMs? For example, ever heard about github enterprise? you pay a stupid amount, github setup everything almost identical to the public github, just on your servers for your employees. Why big techs won't do the same here?

Those high profit margin part of the LLM business is for big players only, they don't burn hundreds of billions to offer you opportunties to cut their profit by capitalizing on their core business.

Communism doesn't exist in high tech. People don't work their xxx off to pave ways for your free lunch for life.

manishsharan · 2024-01-23T14:54:04 1706021644

I wanted to respond to you but your hostile tone implies you are not looking for a conversation.

nemothekid · 2024-01-23T17:22:37 1706030557

Is the idea that big players will get tired of paying a license to a company like Qdrant and write their own database? I just don't really see why they would do that - if Qdrant is similar in complexity to any standard DB, it's like asking why doesn't Apple just rip out MySQL and write their own Apple DB.

I can see them replacing it if Qdrant isn't able to scale to their needs - thats why we ended up with Dynamo, Spanner, MyRocks. However its likely that its probably easier to just acquire the team - like Apple did with Foundation - and become project stewards than trying to invent a new datastore to save pennies.

spullara · 2024-01-23T18:18:40 1706033920

qdrant is trivial compared to a normal database.

mritchie712 · 2024-01-23T13:58:21 1706018301

OpenAI uses Qdrant for chatgpt and a few other products.

https://news.ycombinator.com/context?id=38611608

guappa · 2024-01-23T14:09:06 1706018946

And has enough money to fork it at any time.

tw1984 · 2024-01-23T14:16:41 1706019401

or just build from scratch with assistant from AI - then market it as built by AI for AI.

esafak · 2024-01-23T18:33:22 1706034802

They could just ask it to do the same to their own product and generate GPT-5

CamperBob2 · 2024-01-23T21:02:59 1706043779

There is not much technical stuff there.

I know that, and you know that, but tell it to a jury of East Texas hayseeds.

clbrmbr · 2024-01-23T11:30:48 1706009448

What’s the mote here? Seems to be a risky investment when it’s such a crowded space and likely to be decent open source alternatives for those with small budgets and homegrown solutions for companies with bigger budgets and requirements.

jjackson5324 · 2024-01-23T11:48:28 1706010508

It's a series A. Thinking about the moat in a hot market like vector DBs is a great way to miss out on unicorns.

tw1984 · 2024-01-23T13:26:00 1706016360

the OP argued that it is not a hot market - companies like openai is going to eventually use its own while small players are going to just use openai's assistant APIs, they don't have to operate their own "vector database".

it is also worth to mention that even if there is going to be a market called "vector databases", which is highly unlikely, you can't just written off all existing regular databases and pretend that they are not going to just walk in and take over.

all in all, there is no reason to believe it is a hot market. it is much better to ask is there going to be a market at all.

nostrebored · 2024-01-23T18:16:41 1706033801

For a comparison of existing assistants API vs. vector search, you can check out my blog at https://nostrebored.com

At a high level there are a few differences

- Control over embeddings. What gets embedded? What are the output vectors? What models do you use? How do you handle multimodal input?

- Performance. When you make a call to Assistants, you have to wait for the Assistant to understand that it needs to do RAG. This performance hit is actually quite large (look at the two videos on the blog for reference)

- Cost. OpenAI has an incentive to load the context window to consume more tokens. A few dozen calls to Assistants was costing me around $10.

tw1984 · 2024-01-24T12:35:38 1706099738

> For a comparison of existing assistants API vs. vector search

Sorry, but I am not going to read it as it is not an apple to apple comparison atm. OpenAI just released its assistants APIs literally just weeks ago, when so called vector databases have been burning money for ages. You can write a thesis on how those vendors are doing slightly better for now, that won't be the big picture showing the reality on the ground. All those minor issues & unreasonable restrictions can be solved & removed, I don't see any real challenge for openai to implement them. Give openai a few months, they will convince most vector database vendors & gamblers to pack up and leave the field.

mritchie712 · 2024-01-23T13:59:05 1706018345

OpenAI uses Qdrant

https://news.ycombinator.com/context?id=38611608

tw1984 · 2024-01-23T14:06:54 1706018814

for now.

let me repeat what I have already explained - when compared to today's leading AI tech, a "vector database" is just ancient tech. major players are going to build their inhouse solution or they'd conclude it to be some kind of labor intensive & low profit margin baggage and outsource it.

you can build a business around it, just like all major tech companies have cleaning guys work for them one way or another, people have to realize that it doesn't make carpet cleaning a high tech or strategically important business.

nostrebored · 2024-01-23T18:19:29 1706033969

Work into performant vector search is an active area of research. If it were such a commodity, there wouldn’t be such a wide variance in performance among existing solutions.

There are a ton of open questions. If you think about Elasticsearch as a similar domain, you have complexity at the ingest, storage, and horizontal scalability layer. If you think places are going to invest in their own distributed system that handles these components, I think you’d be as wrong as saying that people will invest in their own managed Lucene implementations.

lyjackal · 2024-01-23T14:05:18 1706018718

Qdrant is open source (with hosted offerings).

weinzierl · 2024-01-23T11:33:05 1706009585

A while ago I read in a thread here that they are used in OpenAI's products and at another popular company. I am not sure but vaguely remember X/Grok.

They are also a Rust shop.

Who says Germany has no cool startups.

EDIT: Yes, it was Grok.

sroecker · 2024-01-23T11:36:28 1706009788

Yes, used by Grok: https://twitter.com/qdrant_engine/status/1721097971830260030 Oh wow, completely missed that they're German. Should have noticed their "Impressum"..

treprinum · 2024-01-23T11:42:48 1706010168

Is Open AI using it in their assistants API for retrieval? Answer performance of those is really bad and retrieval is slow compared to Pinecone.

simonw · 2024-01-23T12:16:32 1706012192

Yes, it's used for the RAG implementation - though we only know that due to information leaked in an error message I believe: https://twitter.com/altryne/status/1721989500291989585

infecto · 2024-01-23T12:51:44 1706014304

Simon, I am always amazed how you are able to keep up with so much so quickly!!

saliagato · 2024-01-23T12:18:25 1706012305

Hard to believe OpenAI uses Quadrant when they are backed by Microsoft, thus having Azure Cognitive Search (now "AI" Search)

chintler · 2024-01-23T13:57:17 1706018237

Cognitive Search is nowhere as good as a 'pure' vector DB. Behind the scenes, it's a managed elasticsearch/opensearch with some vector search capabilities. The 'AI' implementations I've done with Cognitive Search always boil down to hybrid(vector+fts) text search.

lazydon · 2024-01-23T16:12:00 1706026320

In context of RAG, the goal is not to have a pure vector DB but to have all the relevant data that we can gather for a user's prompt. This is where Cognitive Search and other existing DBs shine because they offer a combination of search strategies. Hybrid search on Cognitive Search performs both full text and vector queries in parallel and merges results which I find a better approach. Further, MS is rebranding Cognitive Search as Azure AI Search to bring it more in line with the overall Azure AI stack including Azure OpenAI.

treprinum · 2024-01-23T14:10:50 1706019050

Cognitive Search already contains hybrid search (vector + BM25 + custom ML reranking) and they use chunks of 2048 tokens with a custom tokenizer. So it should be now better than most vector DBs. One could probably make something better by using some version of SPLADE instead of BM25 but their secret sauce lies in their custom ML model for reranking that gives them the largest search performance boost.

wodenokoto · 2024-01-23T12:50:51 1706014251

Do you have any experience in AI search to compare it to other products?

I’m genuinely curious to know if it’s any good.

mindvirus · 2024-01-23T11:32:51 1706009571

Congrats to them!

What have your experiences with vector databases been? I've been using https://weaviate.io/ which works great, but just for little tech demos, so I'm not really sure how to compare one versus another or even what to look for really.

danielbln · 2024-01-23T11:39:31 1706009971

We're using Postgres with the pg_vector extension for basically all of our projects. We know and love Postgres, it has a big track record, the extension is supported on all major managed cloud offerings, no new tooling needed, pg_vector supports HNSW indices for performance as well.

Once in a whole supabase slips into a project, but that's basically just Postgres with some bells and whistles on top.

I got nothing bad to say about Pinecone, We aviate, Chroma etc. but when it comes to dbs, I like to go with the devil I know.

andre-z · 2024-01-23T11:46:32 1706010392

You should use whatever works best for you unless you face some limitations. The issue is that vector databases are not databases but search engines. It is ACID vs BASE. A few thoughts on this https://qdrant.tech/articles/dedicated-service/

Fendyfd · 2024-01-23T11:44:41 1706010281

There are multiple vectordb available in the market, open source ones include Milvus, Qdrant, Weaviate etc. Cloud services include Zilliz Cloud (managed milvus), Qdrant Cloud, Weaviate Cloud etc. Try using a benchmark tool to evaluate them. Here is an open-source option for your reference: VectorDBBench (https://github.com/zilliztech/VectorDBBench)

tw1984 · 2024-01-23T13:36:15 1706016975

why would anyone use a "benchmark" tool from a vendor (zilliz here) to test the performance of its competitors?

Fendyfd · 2024-01-24T02:58:12 1706065092

Good question. 1. VectorDBBench is an open-source benchmark that is pretty vendor-neutral. If you've used it or checked its leaderboard, you'll find that almost every database is ranked first or in the top place on specific metrics.

2. The key to using a benchmark is not to find which vector database is the "best" but which one is most suitable for your use cases and applications. Actually, there is no "universally best" vector database for any situation. So I don't think anyone should be able to cheat on this.

tw1984 · 2024-01-25T08:24:51 1706171091

> VectorDBBench is an open-source benchmark that is pretty vendor-neutral

No this is not vendor neutral unless you can prove that zillz included all tests that would expose its weakness.

> If you've used it or checked its leaderboard

see, you called it "leaderbord", it is a good proof that it is not vendor neutral. By using such highly biased tool, its hosting url has both the name of your company and the term "leader", it implies that the company is a leader when compared to other "vector" databases. tell me how such dirty trick is called vendor neutral?

please, HN is a place full of professional developers, many of those including myself have been in the business long enough to tell what is cheap propaganda and what is real cool tech.

> Actually, there is no "universally best" vector database for any situation.

because whether there should be a product category called "vector database" is in doubt in the first place. as explain, it is a low tech stuff significantly easier to design & implement than today's regular databases (e.g. cockroachdb). its role will eventually be filled by other real databases.

Fendyfd · 2024-01-25T14:02:35 1706191355

It is open-sourced, and you can find its source code on GitHub. The leaderboard is just something easy for readers to have a quick look at. You're welcome to use this tool to generate your results.

I won't argue with you about whether "vector database" should be a product category. You are not the first or even the last person who has doubts about it. Maybe many years later, we'll have the answer.

Fendyfd · 2024-01-24T03:01:30 1706065290

FYI the VectorDBBench leaderboard: https://zilliz.com/vector-database-benchmark-tool?database=Z...

tw1984 · 2024-01-24T12:29:01 1706099341

could you please stop posting such advertisement? why would anyone read vector database benchmark results generated by a vector database vendor?

Fendyfd · 2024-01-25T13:45:45 1706190345

trying to answer your questions since you asked.

tw1984 · 2024-01-27T15:24:41 1706369081

no, you were trying to actively promote a particular vendor - for well known reasons. HN is not your free advertising platform, could you please just pack up you ads and leave people alone?

bootsmann · 2024-01-23T14:34:44 1706020484

I think one of the big advantages of qdrant is how easy it is to do a poc because it allows you to have an “in-memory” version of the database similar to sqlite. One of the big competitors, Milvus, comes with a fairly intricate docker-compose you have to spin up to try it.

avereveard · 2024-01-23T16:25:12 1706027112

Well sqlite now has a vector extension so it's super convenient for testing. Between that and fts5 sqlite can stand in for any advanced search service as far as poc are concerned.

J_Shelby_J · 2024-01-23T15:35:19 1706024119

Only for python!

bootsmann · 2024-01-24T09:31:55 1706088715

I mean you’re storing vector embeddings, the chances those come from some torch model are reasonably high.

swalsh · 2024-01-23T12:38:39 1706013519

I've been using Qdrant. Can't speak highly enough of the core functionality. It's fast, good accuracy, easy to use etc.

I think there are some things I wish were easier, for example finding and updating points, and the UI could be better.

tw1984 · 2024-01-23T14:25:56 1706019956

We have to be honest - "vector" database is a low tech stuff when compared to today's AI. You shouldn't be expecting to walk into the battle of AI, which is arguable the most important one in our life time, to dig a chunk of significant profit from major AI players' pocket by just having some low tech stuff. They use external "vector databases" for now because they don't want to invest R&D resources on such non-key issues for now.

for now is the keyword here.

When the company grow to 10k or 30k people, there will be teams competing for visibility, someone is going to build their inhouse "vector database" to get his/her slice of the pie. Do you still believe that any AI major player is going to reply on some external vector databases?

coffeebeqn · 2024-01-23T14:44:23 1706021063

Are in-house databases that common? I thought generally we’ve found as an industry that to be a great thing to purchase. I do wonder how many will need anything other than the vector support in their already existing Postgres instances though

tw1984 · 2024-01-23T16:04:39 1706025879

> I do wonder how many will need anything other than the vector support in their already existing Postgres instances though

exactly! if there is a real & strong demand, we'd be seeing open source ones get upgraded & ready in months. it is more like just one of those "I want to build something easy in the core but fancy in its name to get some quick VC $"

Prosammer · 2024-01-23T14:46:31 1706021191

My understanding is that these vector search databases are generally used by people who want to use an existing AI and extend it with RAG etc. Was anyone ever expecting major AI players to use a tool like this as you are suggesting?

tw1984 · 2024-01-23T15:55:53 1706025353

> use an existing AI and extend it with RAG etc

companies like openai will fill the gap by offering such features out of the box. there is no logical reason why and how a big AI tech giant is going to take all hard work and letting someone else to take the profit by ignore the last mile issue. in fact, openai has already released such APIs in their last devday event.

softwaredoug · 2024-01-23T23:40:17 1706053217

That implies OpenAI wanting to host all your data somewhere in their vector storage.

The real issue is actually Google Cloud or AWS having a vector search solution. Or companies having a Lucene-based search engine or existing DB for vector based search.

naveen99 · 2024-01-24T00:34:21 1706056461

Vector databases are complementary to “today’s AI” as they store and index embeddings which are a key output of large language models. As LLM’s get better at generating text and images, their embeddings also get better.

PheonixPharts · 2024-01-23T20:47:11 1706042831

Can you name me a single 10k-30k people company that has their own internally built relational database? Their own internally developed document database? I've never seen this in my career.

I don't even know any 10-30k people companies that build their own search, most I've known use elastic search or lucene.

It's had to imagine that a few of these vectordb companies don't establish themselves as the standard solution, being the equivalent to MongoDB in their space. The other competent players will very likely get acquired.

Certainly these vectordb companies are in a better long term standing that the bajillion companies rushing to build products that are just calling an API endpoint at the end of the day.

tw1984 · 2024-01-24T11:57:12 1706097432

please read my reply again, I was talking about AI companies (e.g. openai) are going to build their own vector database when there have a large enough team.

for your question, google built spanner (that was like 12 years ago) and leveldb when it had about 30k people. Meta/facebook forked leveldb and started building rocksdb when they had less than 30k people.

no one is saying any banking, insurance, retail companies with 30k employees should be building their own databases/search engines. They shouldn't. That is actually my main argument - those companies shouldn't get too involved for infra like that, they should just take the offerings of the big tech companies.

hartator · 2024-01-23T15:54:33 1706025273

> For example, it can automatically map ‘frontend engineer’ to ‘web developer’

Small revolution indeed.

Ref: https://qdrant.tech/use-cases/

avereveard · 2024-01-23T16:17:29 1706026649

I really don't understand that sample the similarity capability is provided by the external embeddings model not by quadrant per se, unless they have some proprietary embeddings.

minimaxir · 2024-01-23T17:08:47 1706029727

Correct, but that one-pager is more aimed toward project managers than engineers. Marketing copy is weird like that.

anonzzzies · 2024-01-23T11:49:15 1706010555

The sourcecode is very readable of this product. And good license, no agpl or worse stuff.

treprinum · 2024-01-23T11:58:39 1706011119

Why is AGPL bad? It would prevent Amazon from taking it from the founders and making money off it without giving anything back, like they did to dozens other products.

JoachimS · 2024-01-23T12:27:36 1706012856

It also makes it less interesting for other potential customers to use it. Reducing the potential market is probably not what a VC funding a startup wants.

menaerus · 2024-01-23T16:40:33 1706028033

IMHO it looks arcane to understand or to debug, and with most likely a lot of negative performance implications, due to its shared-ptr-in-disguise all-over-the-place design.

    $ git grep "Arc<" | wc -l
    451

It could be probably related to the fact that the main author of the codebase is coming from the Java/Scala world. Or perhaps it's the Rust safety guarantees.

nemothekid · 2024-01-23T17:25:05 1706030705

Qdrant is a async Rust project, so there will be lots of Arc. Rust safety guarantees doesn't really let share references across threads haphazardly.

menaerus · 2024-01-23T19:43:23 1706039003

Something being async doesn't imply shared-ptr design. But perhaps this is what Rust makes you to to achieve its safety guarantees?

nemothekid · 2024-01-23T20:05:27 1706040327

Yes; in many cases, if there are two tasks that hold some piece of data, the compiler cannot, at compile time, when to free that data, as it doesn't know which of the two tasks will finish first.

That means the lifetime of that object must be tracked at runtime, with a garbage collector. This is where you get Arc (or shared-ptr).

I'm not sure how you would safely solve this in C++, but FWIW, scylladb is a high performance database that also makes use of shared_ptr.

menaerus · 2024-01-23T23:06:10 1706051170

Multi-threaded and/or async code can perfectly exist without shared-ptrs. And no, you don't need a shared-ptr to manage the lifetime of an object shared by two threads.

nemothekid · 2024-01-24T15:47:10 1706111230

That might be true of C++, but I don't think its true in Rust. Furthermore, I don't think the performance concern is as dire as you believe - both Scylla and Redpanda, two high performance C++ databases both use shared_ptr with their async framework Seastar.

menaerus · 2024-01-24T15:56:43 1706111803

> That might be true of C++, but I don't think its true in Rust

And which was exactly my point.

> both Scylla and Redpanda, two high performance C++ databases both use shared_ptr

Such argument doesn't make sense since you will find usage of shared-ptrs in probably more or less every high performance C++ codebase out there. The secrete sauce is to know _where_ and _when_ to use it. I'm not against it but I'm definitely not in favor of language forcing me to user it when I don't need it.

anonzzzies · 2024-01-23T11:47:26 1706010446

Offtopic: Is there a good OSS mixed (vectors + traditional) that can be embedded in our own solution and allows storing indexes in a pluggable kv storage? Besides rolling one, I cannot really find anything. Rust or Go would be best.

softwaredoug · 2024-01-23T15:02:06 1706022126

Someone has to ask the question: How many vector DBs do we really need? How do the vector DB companies differentiate themselves? And why do we need a company at all when there are increasingly awesome open source options?

I genuinely ask - there are a lot of other problems in the RAG, fine tuning, AI/LLM, retireval space, to solve. And more and more vector retrieval is, while not 100% solved, at least is something the community has a grasp on the tradeoffs. Solved to the point that squeezing a bit more recall out of vector retrieval isn't the problem anymore.

sanp · 2024-01-23T16:12:50 1706026370

Agree but then the same argument applies to RDBMSs and multiple vendors seem to be doing OK in that space. I think it ultimately comes down to "stuff" (sales journey, price, support etc.) other than the technology itself. I am sure any RDBMS can meet most of the requirements of any given customers (in most cases) but we still see customers buying across vendors.

inertiatic · 2024-01-23T15:50:18 1706025018

>Solved to the point that squeezing a bit more recall out of vector retrieval isn't the problem anymore.

I think this is a bit of a strawman. I don't think recall is the main point these systems are trying to sell us on, it's more about robustness and ease of use compared to building something inhouse or using a lower level library to build a system on top of it just for this small part of your overall project/product (be it RAG, search, whatever).

I guess Lucene-based solutions, while very mature overall in terms of engineering, lagged behind this functionality (out of caution, trying to build what's going to be long term useful) and are also perceived a bit too cumbersome. So these stores do make sense, I think. The core functionality is nothing too complex (at least HNSW), but hiding it behind a stable black box with just a few inputs and levers, has value for people that are likely to use these stores.

esafak · 2024-01-23T18:37:56 1706035076

qdrant is open source. Being open source is not in opposition to running a company; it is part of their strategy.

There is still work to be done in vector databases. None of the products have perfected hybrid search yet, for example, and performance varies a lot between products; they are not fungible.

shanghaikid · 2024-01-23T11:38:46 1706009926

Congratulations.

What do you think you milvus? https://milvus.io/. The difference seems significant from the architecture perspective.

infecto · 2024-01-23T12:26:00 1706012760

I am excited to see how the vector search space plays out. Most of my work is not constrained by a low latency chat type user experience and I have not touched most of the vector search apis. I wonder what the difference is between competitors. The way I picture it is everyone is starting up their own Elasticsearch hosted solution and while there are some differences in functionality, the real bet is cost and scale.

ankit219 · 2024-01-23T12:54:56 1706014496

I think alpha lies in how good the embedding space is rather than which db you use to store and retrieve. A typical tradeoff between accuracy and performance, and here accuracy will be more important in many cases esp for businesses and enterprises. With that, and existing database providers introducing their own support for vectors, this space might be commoditized in near term.

Re embeddings, you would likely get better results if you train your own embeddings model. A popular approach is ColBERT, which anecdotally outperforms vector search in border cases[1]. Second is training an embedding model using initial layers of an LLM. [2]. In Colbert's case once it's trained, you dont need a db to store the vectors.

[1]: https://twitter.com/arjunkmrm/status/1744741903646773674 [2]: https://huggingface.co/intfloat/e5-mistral-7b-instruct

infecto · 2024-01-23T13:58:16 1706018296

I agree with you. I was ignoring the accuracy/performance tradeoff. Even in that space while there is certainly a lot of innovation left, there is already so much that is available commercially open source. If that holds true, you are really left with competing on price and scale in the long run.

redwood · 2024-01-23T11:29:27 1706009367

Anyone using Qdrant in prod?

inertiatic · 2024-01-23T15:06:56 1706022416

I do, and it's very very rough around the edges to be honest. Lots of things broken, things are even breaking between releases suddenly in unexpected places. Or at least, I'm used to working with more robust data stores. If my work was more high stakes, I'd have already advocated for moving our vector search to something more robust. Thankfully it's not and I can just maintain what we're making with not too much stress, and enjoy seeing this OS project grow from a user perspective (haven't seen a data store go through this very initial phase in my career yet).

Support from the team is great however, and congrats to them for this round!

esafak · 2024-01-23T18:42:05 1706035325

Please elaborate. What would you have moved to, for example? This is valuable information.

crucio · 2024-01-23T13:01:51 1706014911

We are for a few projects. We've been using them for over a year and have been impressed. We have 10s millions of items in there with lots of daily inserts/deletions etc. There's been a couple of gotchas but generally it is quite predictable and scalable.

We use 768 dimensional vectors for our items with several other payload filters (e.g. language). Performance has been good and I think the qdrant team focus on the right features without creeping into other areas.

swalsh · 2024-01-23T12:41:36 1706013696

I built a little proof of concept that uses it in the RAG pipeline, it's been proving quite useful, so we're just starting the move to production.

It's probably going to stay, but I'm also evaluating databricks new vector store as we're using databricks for all the analytics parts of the app already, and having them all on the same infrastructureis appealing.

andre-z · 2024-01-23T11:30:19 1706009419

Many: https://testimonial.to/qdrant/all

https://techcrunch.com/2024/01/23/qdrant-open-source-vector-...

redwood · 2024-01-23T12:02:01 1706011321

I should have been clearer in my question: It would be great to hear directly from people who are using them about their successes and what their experience has been like

andre-z · 2024-01-23T12:12:33 1706011953

Does this count? https://testimonial.to/qdrant/all

yujian · 2024-01-23T15:01:22 1706022082

Good on them, I know the crustaceans are out here happy about this raise for a Rust based Vector DB!

(now I'm gonna plug what I work on)

If you're interested in a more scalable vector database written in Go, check out Milvus (https://github.com/milvus-io/milvus)

andre-z · 2024-01-23T15:29:31 1706023771

The open-source benchmarks show different results. Feel free to make a PR to improve. ;) https://qdrant.tech/benchmarks/

rvz · 2024-01-23T11:59:39 1706011179

Well deserved funding round for a company that underpins most of the AI hype happening all over the place and probably always overlooked by many analysts.

Let’s see what they can do in a year or more with that new capital.

braza · 2024-01-23T18:23:32 1706034212

Outside AI and LLMs, there are some solid use cases for those Vector Search Databases? Maybe I am not seeing something, but it’s hard to see it gaining traction outside tech companies.

esafak · 2024-01-23T18:31:23 1706034683

Vector databases enable semantic and similarity search. What company does not need that?

beernet · 2024-01-23T19:41:10 1706038870

Companies that don't want/can build it by themselves, so the majority of enterprises. It's a nice series A by the numbers, at the same time, generating relevant revenue will very likely not happen (given the valuation at this round was probably around 200M€). It's hype all over but can't blame them, would do the same I guess.

ancorevard · 2024-01-23T17:30:33 1706031033

Honest question, how long before EU makes it unattainable for Qdrant to remain in Germany/EU?

wahnfrieden · 2024-01-23T17:32:29 1706031149

What’s the best vector db for text similarity that can run in browser front ends too?

yding · 2024-01-23T18:55:12 1706036112

Congrats! Amazing milestone.

spullara · 2024-01-23T18:15:49 1706033749

Honestly there is no reason, except huge scale, to have a separate vector db. Every normal database and search engine now support vector search.

_mh56 · 2024-01-23T11:46:25 1706010385

I applied to Qdrant a while back and got this response:

"We are getting many applications for this position. Usually, a test task would help preselect suitable candidates. However, since we develop open-source software, we rely on contribution.

You can build an open-source Qdrant connector to another framework or library. The simplest one would be, for example, a Streamlit data connector. But other ideas are more than welcome!

No limitations and no deadline. As long as this job position is online, we accept submissions. After you are done, send us an email to career@qdrant.com with the link to the repo. We will review it and get back to you asap."

No interviews, conversation before this email. Hope they see and fix this.

Edit : No Pay.

infecto · 2024-01-23T12:51:02 1706014262

This sits somewhere in the middle for me. On one hand I probably would not do this exercise for most companies but I probably would for a company I was excited about. It kind of makes sense if they are getting high volume of applications, you definitely will miss great candidates doing this but does it not also servce as a self-selection for the type of candidate they want?

Curious if the co-founder who was posting in here will share his take.

_mh56 · 2024-01-23T13:49:31 1706017771

To give more context from an employee's point of view [just my opinion and might be completely wrong]

For a job switch, I need to spend time in three different stages:

---------------

Preparation:

Leetcode (Blind 75) : 150 hours

System Design + DBMS + OS + Networking : 100 hours

Behavioural Questions (preparing STAR format answers): 10-20 hours

------------

Application:

Avg time for sending 500 applications: 20 hours (Assuming 1 application every 2.5 minutes)

-------------

Interviews:

Let's say I got 25 callbacks and 10 of them asked for takehome.

Person to person interviews time: 25 * 3 = 75 hours

Takehomes: 10 * 6 hours = 60 hours.

-------------

All in all, I'm already spending 415 hours of unpaid work to get x% of salary increment. Not including the side projects or hackathons we may need.

So having a takehome exercise asking to make an active contribution to the company is....bad. Sure i can reject it but not everyone will. which is what led us into the multiple rounds of algo interviews hellhole.

I apologize if what I'm saying is harsh. All I want is for leadership to see us as humans with families and not monkeys jumping through hoops.

infecto · 2024-01-23T13:56:43 1706018203

Not harsh and I don't disagree. On the flip side though if they have a large volume of applicants and they are a sub 50 company right now, it probably does not matter what kind of hoops they make people jump through, they will most likely identify candidate that match their fit. What I am saying is that nobody is wrong in this situation.

randomdata · 2024-01-23T14:03:46 1706018626

But you must remember that they don't want you. They already have more applicants than they can handle.

The trouble is that the leadership does see you as human, which results in them trying to say "Go away! You are not welcome here." as politely as possible.

srackey · 2024-01-23T14:21:11 1706019671

415 hours??

Well this is the problem with tech hiring. Not convinced an IQ test or other “unstudyable” exam wouldn’t be better.

Hell maybe nepotism was the way to go all along. Do other industries require 10 weeks of full time practice to get a gig?

7thaccount · 2024-01-23T14:29:04 1706020144

My industry is related to the power grid. I've worked at two amazing companies. The first had me come out for a 3 hour interview for a summer internship where they accessed my work ethic and culture fit. Once I graduated, I was immediately given an offer letter. The second job required most of a day to interview and I had to prepare a PPT and then got an offer. I also interviewed for another gig that did like four one-hour interviews spread out across a month. What software developers do sounds like absolute hell. My industry has very high demand and very low supply of experienced candidates at the moment though.

randomdata · 2024-01-23T17:24:09 1706030649

It is not a problem with tech hiring in general, just hiring where there are millions of people lined up down the street vying for the same position. To be sure, the job will still most likely go to a friend or relative, but if you are willing to jump through insane hoops you might also be considered. But it is to be taken as a hint that says: "Unless you are extra super sure that you are so special that we can't turn you down, don't waste your time, or ours."

Most other jobs, including Mom & Pop Tech Co., are happy if anyone applies at all and will take what they can get.

esafak · 2024-01-23T18:49:55 1706035795

They want candidates who care about their product, not people who merely rank companies by compensation, subject a constraint on time spent preparing. If you don't particularly care what you are working on, you would be better off at a big company.

codetrotter · 2024-01-23T13:57:45 1706018265

If it’s a job that wisely does not emphasize leetcode, you can skip those 150 hours of that.

ativzzz · 2024-01-23T14:02:33 1706018553

yea but a lot of the super high paying ones do, so it's a small price to pay for a large paycheck

jonathankoren · 2024-01-23T16:06:05 1706025965

I have worked at multiple FAANGs and even small startups. I have never once seen anyone care about leetcode, GitHub punchcards, stack overflow score, or any of the social media stuff people boast about here. Literally none of this fits in to any evaluation rubric.

All it says is your hobby is programming.

therealdrag0 · 2024-01-24T02:58:10 1706065090

Huh? It’s not about some profile scores it’s about practice. Leetcode type problems are asked at most (all) FAANG like companies.

azinman2 · 2024-01-23T15:01:45 1706022105

I’m in a privileged position and have never done leetcode.

pclmulqdq · 2024-01-23T15:10:07 1706022607

On homework problems for jobs, I have a strict policy of "no more than 4 hours of free work, and I retain full copyrights to that work." A lot of people are picking up on the first clause of that policy, and companies seem to be adapting, but the second clause still isn't common yet.

epistasis · 2024-01-23T15:16:26 1706022986

When the company is asking for open source contributions to an open source code base, as it is in this particular case, that second clause is clearly a deal breaker.

lysecret · 2024-01-23T13:43:15 1706017395

Yes I agree. I think you could easily add if the pr is merged we reward it with some money etc and it would be perfect.

redwood · 2024-01-23T11:59:40 1706011180

That's a pretty smart way for them to seed the ecosystem with open source connectors! Are you implying that that's what they were really trying to do here? Or do you think it was a genuine filter technique?

nijave · 2024-01-23T13:00:32 1706014832

You could always release and license under BUSL or AGPL or some other business non friendly and relicense if you got the job.

LudwigNagasena · 2024-01-23T15:00:59 1706022059

You can simply publish it without any license, in which case you reserve all rights.

sgc · 2024-01-23T15:47:12 1706024832

Nobody is going to hire you if you can't follow basic instructions.

simonw · 2024-01-23T12:14:24 1706012064

If they're paying you for your time, this is kind of smart.

If they're not then it's a scandal.

cj · 2024-01-23T13:58:02 1706018282

They can easily make it not a scandal by changing things so applicants contribute to an open source project that they don't directly benefit from as a business. Easy solution

treprinum · 2024-01-23T14:59:08 1706021948

Let's just make qdrant to pinecone/weaviate/redis/etc. data exporters, that would make the company super happy! Free labor benefiting their competitors.

guappa · 2024-01-23T13:58:51 1706018331

Why pay when you can get free labour?

_mh56 · 2024-01-23T12:17:32 1706012252

Edited :|

anonylizard · 2024-01-23T14:54:33 1706021673

Most of the prestigious and elite indie game firms (Those that pay very well, fully remote, have a hugely successful product that can be sold for decades) basically only hire modders into their team.

Like, you had to have actively developed mods for them, for free, for years, and be famous in the community, then they'll hire you (If you want).

This works because the working conditions there are far far better than your average game company. And probably much more fun than say a bank.

mpawelski · 2024-01-23T15:11:16 1706022676

What concrete companies you are talking about?

sireat · 2024-01-23T18:03:41 1706033021

For some reason this does not seem as exploitative as most of the interview circuit elsewhere.

Over the years I've read/heard plenty of stories (here on HN and elsewhere) of people getting hired for their open source contributions to some stack that some company is using/developing.

So here I am willing to give some slack here to Qdrant. They get extremely qualified candidates who can jump right in, and candidates get told the rules of the game up front. It feels fine?

Surely much better than fake take home tests, whiteboard tests, leetcode onslaught, and 7 layers of interviews.

So if creating a high quality repo is 60-90% of your job interview that seems pretty good. As long as they are not ghosting high quality contributions that is.

I will change my view if they get 20 high quality connectors out of this and noone gets hired from that pool of candidates.

shizcakes · 2024-01-23T12:58:57 1706014737

I don’t see a single such submission in their pull requests, open or closed, in the past couple weeks.

simonw · 2024-01-23T15:36:58 1706024218

Read that comment more closely: they didn't ask for PRs to one of their own repos, they asked for a brand new repo to be created, and a link to that repo to be emailed to them.

wdroz · 2024-01-24T09:14:16 1706087656

> No Pay

From this job posting [0]

> Compensated Interview Tasks: We value your time and effort; candidates will be compensated for completing interview tasks.

[0] -- https://join.com/companies/qdrant/10275180-core-rust-enginee...

osigurdson · 2024-01-23T14:41:39 1706020899

This is actually kind of brilliant.

Oras · 2024-01-23T11:51:31 1706010691

Do you expect a tech interview by just applying?

From your perspective, you're filling out an application, maybe writing a cover letter, but on the other side, there are 100+ applications like yours. Not all of them are qualified, CVs are not a trustable source anyway.

That's why companies add tests to filter first, then interview later.

_mh56 · 2024-01-23T11:54:42 1706010882

I don't expect interviews. But I also don't want to spend 20 hours working before getting a "Unfortunately we've decided not to move forward" message.

As a thumb rule, I'm happy to put 4x more effort than the company. If they interview me for 1 hour, I spend 4 hours doing the take-home. Anything more feels like exploitation.

epolanski · 2024-01-23T12:51:08 1706014268

Well, they said they receive a lot of applications so they are in the position of setting the rules.

You are absolutely right into setting your own rules as well, those haven't overlapped.

pclmulqdq · 2024-01-23T13:03:35 1706015015

As a general rule of thumb, random series A startups are in much lower demand for top-tier talent than top-tier talent is in demand for these companies. That would mean that the good engineers should set the rules of engagement, and that any startup that thinks they set the rules is attracting worse talent.

Galanwe · 2024-01-23T13:42:04 1706017324

Well, unless Qdrant writes a post complaining about the quality of their applicants, I don't see where the issue is.

Also, not all companies try to maximize for "top tier developer", it seems they are maximizing for "top motivated developer", which does not seem stupid either.

guappa · 2024-01-23T14:00:37 1706018437

Maybe they maximize for "top gullible young inexperienced person that will fall for that"?

pclmulqdq · 2024-01-23T15:11:20 1706022680

It sounds like they are instead maximizing for "free integrations," which seems to be a fine way to get neither free integrations nor high-quality candidates.

andre-z · 2024-01-23T13:45:29 1706017529

Thanks. We will reveal our hiring strategy in more details soon.

epolanski · 2024-01-23T14:16:11 1706019371

The general rule of thumb is that companies are free to decide what they look for and how to find it and so are candidates.

This is an open source database company, obviously the best candidates are contributors to the project first.

Picky ivy league graduate that farms leetcode and system design question is very very low in the ranking of what such companies should look for.

pclmulqdq · 2024-01-23T14:48:08 1706021288

All the ivy league graduate leetcode farmers I know are actually the ones who would do the grunt work of developing database integrations for free if they believed a decent salary at a prestigious job might be waiting over the hill.

The people I have met with the lowest tolerance for this stuff are the ones who actually produce the most impactful work. Partly because they don't do work that has no impact on their lives.

Edit: Obviously, they can do whatever they want, but that doesn't mean that it's a good sign from outside.

treprinum · 2024-01-23T15:17:14 1706023034

"We receive a lot of applications" can be also a marketing speech and it could also mean they are flooded by spam requests from all over the world they can't filter out.

otabdeveloper4 · 2024-01-23T13:00:31 1706014831

Then this job is not for you, simple as. Sounds like a win-win.

0xedd · 2024-01-23T12:06:11 1706011571

Don't be ridiculous.

HR gets paid to talk to candidates. I don't get paid to apply. The initial screening call is what allows a company to gauge the relevancy of a candidate. Let him speak about some of the topics and see how in-depth they go. Either the HR is familiar enough with the tech to understand proficiency (think a student listening to a maths professor) or they let a TL have a short conversation. I've overheard unqualified HR do their jobs badly, too; They laughed at picking them by looks and "feels". But, that's out of scope here.

A large company has millions to invest in different areas. Intrinsically, it has a much larger margin of error. You accidentally overprovisioned some resources and cost the company 10k? Tis but a scratch. You POC some personal project and accidentally get billed 10k? That is not the same.

A company can spend money on hiring. It is expected to. A private person can't spend money on applying to jobs. It isn't expected.

It's interesting to see how the shift goes from the self to the company [and to the country]. A little bit of communist propaganda goes a long way, eh, comrade NPC?

guappa · 2024-01-23T14:03:10 1706018590

> NPC

When someone calls others "npc", I understand that they are complete psychopaths that are somehow capable of thinking that the other people don't live the full human experience as they do.

unglaublich · 2024-01-23T12:19:20 1706012360

Companies not paying for their applicant's time is communist now?

klebe · 2024-01-23T14:52:51 1706021571

[flagged]

pclmulqdq · 2024-01-23T15:12:40 1706022760

The "unpaid labor" companies never seem to attract good people. Also, using interviews as a way to get free labor is illegal.