I don't understand the emphasis here on vertical scaling. Move a database to a bigger machine = more storage and faster querying. Not exactly rocket science. Horizontal scaling is the real challenge here, and the complexity of vector indexes makes it especially challenging. Milvus and Vertex AI both have horizontal scaling ANN search and the ability to do parallel indexing as well. I appreciate the post but this doesn't seem worthy of an announcement.
Completely true. You have to understand the economics behind this to see why their claim is hyperbole at best and flat out misleading at worst. The fundamentals of scalable vector search is that you are dealing with potentially huge dimensionality and huge datasets, that means that memory consumption will be huge even for a modest (by today's standards) datasets. This problem has garnered lots of research attention, so making such a bold claim makes you think what Pinecone has under the hood that others don't.
Pinecone is VC backed and they have taken in to the tune of $50M in funding. They have to claim the "first" in solving these challenging technical problem, otherwise they'd have to really explain that their "secret source" is not really ground-breaking but relying on a series of open-source components under the hood. VCs wouldn't want to be backing yet another donkey in the derby. The truth is that solutions like FAISS, ScaNN, Weaviate, Quadrant, ANNOY and co. are working on this problem on a much more fundamental level. Pinecone and Google vertex matching AI are working on it on a application level. If Pinecones's solution is truly groundbreaking, they'd publish it in a more scientifically rigorous way. So these claims are to be taken with a grain of salt for what they are: developer evangelism/marketing speak.
I led a prod project that uses FAISS at a bank. A huge amount of the work was about making the index practical in a real IT environment. For example we had to build a sharding system to allow it to scale, but also to allow it to be rebuilt with 0 down time. There were many other significant engineering steps required to get it depolyed.
So, I would say that if Pinecone could solve these problems they don't need to have fundamental breakthrough performance vs the open source systems. On top of that, as every dev knows, there are a bunch of hygiene components and features that prod software wants - connectors, admin interfaces, utilities. $50m is probably a bit low to cover all of these and the marketing to be honest - but it will go a long long way and I guess that there's series B funding to get over the line if they don't sell out.
On the otherhand they must avoid over-committing to the indexing approaches of today because if someone does make an algorithmic step forward and Pinecone don't / can't take advantage then the features that they provide that enable deployment are a matter of engineering. Also, at the end of the day I think vectorDB's are going to be an important niche in the enterprise and not at the scale of data warehouses, lakes, or application DB's. I think that fitting them into the enterprise IT puzzle scape is going to be very important in making them commercially successful and good VC investments.
This sounds like a perfect job for jina.ai -> sharding, redudancy, automatic up- and down scaling, security features and most importantly: flexibility to use and switch whatever (vector) database
https://jina.ai/
Bigger machine doesn't automatically mean higher performance. The code needs to scale with the increased number of cores, has share-nothing or share-very-little approach to avoid contention, and uses efficient data structure to utilize the increased memory.
Larger disk space does help effortlessly scale storage in single-node systems, but I agree with you that shared nothing (and/or shared something) is a necessary step for extracting maximum performance on a larger machine. When it comes to distributed architectures, shared nothingness is important as well. The decoupling of storage from compute and stateless from stateful helps minimize resource allocation when it comes to billion-scale vector storage, indexing, and search. Milvus 2.0 implements this type of architecture - here's a link to our VLDB 2022 paper, if you're interested: https://arxiv.org/abs/2206.13843
Just having "moar disk!" ≠ "scalability." Because unlike running single-thread on a shard-per-core [or hyperthread] basis, aligning NUMA memory to those cores, etc., there's no way to make your storage "shared-nothing."
At ScyllaDB we've put years of non-trivial effort into IO scheduling to optimize it for large amounts of storage. You also need to consider the type of workload. Because optimizing for reads, writes, or mixed workloads are all different beasties.
I agree in the typical case, but they support concurrent add/delete one of their index options. Handling consistency/contention for modifying whatever graph/tree/etc structure they are using is probably nontrivial, and the resulting cache invalidations would also likely affect the QPS.
P.S. Great work on your site, by the way - it's a really inspiring project!
Seems to me you can do that in a way that ensures low contention between consumers by using a read-biased MRSW lock. It's not free such a construction, but it really shouldn't eat into your read performance all that much. You're adding hundreds of nanoseconds to your query time by acquiring and releasing a lock. Unless you're already serving millions of queries per second per thread, this is piss in the ocean.
> With vertical scaling, pod capacities can be doubled for a live index with zero downtime. Pods are now available in different sizes — x1, x2, x4, and x8 — so you can start with the exact capacity you need and easily scale your index.