More

VoVAllen · 2025-04-09T02:25:30 1744165530

It's coming in the Postgres 18. https://www.depesz.com/2025/02/28/waiting-for-postgresql-18-...

sgarland · 2025-04-09T02:39:35 1744166375

Yes (very exciting!), but you won’t be able to index them, and that’s really where they shine, IMO.

Still, I’m sure they’ll get there. Maybe they’ll also eventually get invisible columns, though tbf that’s less of a problem for Postgres as it is for MySQL, given the latter’s limited data types.

danielheath · 2025-04-09T07:16:18 1744182978

You can index arbitrary expressions, though, including indexing the same expression used to define the invisible column, right?

VoVAllen · 2025-03-03T15:23:51 1741015431

Hi, I'm the tech lead of VectorChord-bm25. It's not based on pg_search (pg_bm25). We just chose the same name during our internal development, and changed it to the formal name VectorChord-bm25 when we released it.

VoVAllen · 2024-12-24T04:14:55 1735013695

Hi, I'm the author of the article. Please check out our vector search extension in postgres, VectorChord [1]. It's based on RabitQ (a new quantization method) + IVF. It achieves 10ms-level latency for top-10 searches on a 100M dataset and 100ms-level latency when using SSD with limited memory.

[1] https://blog.pgvecto.rs/vectorchord-store-400k-vectors-for-1...

VoVAllen · 2024-12-24T04:14:39 1735013679

Hi, I'm the author of the article. Please check out our vector search extension in postgres, VectorChord [1]. It's based on RabitQ (a new quantization method) + IVF. It achieves 10ms-level latency for top-10 searches on a 100M dataset and 100ms-level latency when using SSD with limited memory.

[1] https://blog.pgvecto.rs/vectorchord-store-400k-vectors-for-1...

VoVAllen · 2024-12-24T04:07:32 1735013252

Hi, I'm the author of the article. Please check out our vector search extension in postgres, VectorChord [1]. It achieves 10ms-level latency for top-10 searches on a 100M dataset and 60ms-level latency when using SSD with limited memory.

[1] https://blog.pgvecto.rs/vectorchord-store-400k-vectors-for-1...

wild_egg · 2024-12-24T21:27:52 1735075672

Look like very nice tech but unfortunately the operational overhead of postgres is too high for my use case. Will keep it in mind for future postgres-backed projects though

VoVAllen · 2024-12-24T04:03:21 1735013001

Hi, I'm the author of the article. In our actual product, VectorChord, we adopted a new quantization algorithm called RaBitQ. The accuracy has not been compromised by the quantization process. We’ve provided recall-QPS comparison curves against HNSW, which you can find in our blog: https://blog.pgvecto.rs/vectorchord-store-400k-vectors-for-1....

Many users choose PostgreSQL because they want to query their data across multiple dimensions, including leveraging time indexes, inverted indexes, geographic indexes, and more, while also being able to reuse their existing operational experiences. From my perspective, vector search in PostgreSQL does not have any disadvantages compared to specialized vector databases so fat.

nostrebored · 2024-12-24T17:38:25 1735061905

But why are you benchmarking against pgvector HNSW, which is known to struggle with recall and performance at large numbers of vectors?

Why is the graph measuring precision and not recall?

The feature dump is entirely a subset of Vespa features.

This is just an odd benchmark. I can tell you in the wild, for revenue attached use cases, I saw _zero_ companies choose pg for embedding retrieval.

VoVAllen · 2024-12-24T03:50:45 1735012245

Hi, I'm the author of the article. The sequential access pattern of IVF makes prefetching and large block sequential reads much easier, whereas it's almost impossible for HNSW to achieve efficient prefetching.

VoVAllen · 2024-12-24T03:50:33 1735012233

Hi, I'm the author of the article. The sequential access pattern of IVF makes prefetching and large block sequential reads much easier, whereas it's almost impossible for HNSW to achieve efficient prefetching.

cratermoon · 2024-12-24T20:01:51 1735070511

Yes, I get that, but does the large block sequential read pattern matter with SSDs, or do the benefits only accrue with spinning disk drives?

VoVAllen · 2024-12-24T03:46:47 1735012007

Hi, I'm the author of the article. I agree with your point. The model from https://www.mixedbread.ai/blog/mxbai-embed-xsmall-v1 also looks great, though I haven’t had the chance to try it yet.

VoVAllen · 2024-12-24T03:43:52 1735011832

Hi, I'm the author of the article. In our product, VectorChord, we use a quantization algorithm called RaBitQ, which doesn’t require a separate codebook. Unlike IVFPQ, it avoids the need to maintain and update the corresponding codebook, so the update issue you mentioned is not a problem. Regarding filtering, I’m not sure which specific scenario you’re referring to, but we currently support iterative post-filtering and are technically capable of perfectly supporting pre-filtering as well.

binarymax · 2024-12-24T10:33:22 1735036402

Pre and post filtering are both not great. Some HNSW implementations in products like Vespa and Qdrant have filter-during-search.

This remains an unsolved problem in cluster-based indices.