Awesome work!! \* Which distance functions does this support? Looks like it supp...

alexgarcia-xyz · 2024-05-03T18:44:52 1714761892

re distance functions: current L2+cosine for float/int8 vectors, and hamming for bit vectors. There are explicit vec_distance_l2()/vec_distance_cosine()/vec_distance_hamming() SQL functions, and the vec0 table will implicitly call the configured distance function on KNN queries.

re comparison to sqlite-vss: In general, since sqlite-vss uses Faiss, it's much faster at KNN queries, and probably faster at fullscans. Faiss stores everything in memory and uses multithreading for >10k vectors, so hard to beat that. sqlite-vec on the other hand doesnt use as much memory (vectors are read chunk-by-chunk), but it still relatively fast. There are SQLite settings like page_size/mmap_size that can make things go faster as well.

For writes (ie INSERT/UPDATE/DELETE), sqlite-vec is much much faster. sqlite-vss requires a full index re-write for all writes, even single vector stuff. sqlite-vec on the other hand only writes to the effected vectors, so it's MUCH more performant than sqlite-vss in that workflow specifically. I view it as sqlite-vss is more OLAP focused (many fast read and slow writes), and sqlite-vec is more OLTP focused (fast-enough reads and fast writes).

I agree, brute-force hamming distance is surprisingly super fast, especially for resource-constraint environments! Really looking forward to more embedding models to properly support binary vectors.

And yea, I'm still mulling over how ANN queries will work. My initial thought would be to add it to the vector column definition, like:

  CREATE VIRTUAL TABLE vec_items USING vec0(
    title_embeddings float[768] indexed by HNSW(m=32)
  )

Or something like that. But that means you'd need to recreate the table from scratch if you wanted to change the index. SQLite doesn't have custom indexes, so it's tricky for virtual tables. Then there's the question of how you train it to begin with, so lots to explore there.