That's the only way to do it. You can't index the whole thing. The challenge is chunking. There are several different algorithms to chunk content for vectorization with different pros and cons.
As far as I understand it, context length degrades llm performance, so just because an llm "supports" a large context length it basically just clips a top and bottom chunk and skips over the middle bits.
Why would you want chunks that big for vector search? Wouldn't there be too much information in each chunk, making it harder to match a query to a concept within the chunk?
PHP is a great language to learn OOP, classes, interfaces, abstract classes, traits, managing dependencies and unit tests. I'm not using it anymore but I learned basically everything with it a decade ago. Thanks PHP!
It performs better and uses different design choices (for example: SableDb uses tokio's local task per connection, and in general it uses green threads to make the code more readable and easy to maintain).
I will release some design documents later on (hopefully this month). Remember that is a one man project (hopefully, not for long), so it takes time to organize everything :)
I like the idea of doing thread local execution of Tokyo tasks; I assume that means SableDb is mostly single threaded? Was this to reduce complexity, or for some other reason? I'm looking forward to the design doc on this!
It is multi-threaded (configurable, you can set it to a specific number configuration file, or use the magic value 0 where SableDb decides based on the number of cores divided by 2).
Each incoming connection is assigned to a worker thread, and two tokio tasks are created for the connection (one for reading and another for writing).
Using tokio allowed me to use the `async` code without using "callback hell" so the code looks clean and readable in a single glance without the need to follow callbacks
Hi SableDb. I am looking for a tech cofounder in databases. Probably not the best place to ask for a cofounder. :-) Regardless, would you be interested?
Not affiliated (not my first comment about this) but we are using KVRocks[1] for now at work, which is based on RocksDB by Meta and it works nicely. Developers are nice and reactive and the Redis commands support is large.
We picked this project because of our RAM usage that was exploding with Redis.
The only downside for us right now is the Kubernetes support. There is an operator and a controller being made but no Helm Chart yet to deploy Kvrocks with master and replicas easily. That will be awesome.
For a few recruitments, we asked the candidates to create a front app like this with React. It was quite nice as we could quickly see how they use the library, what they know etc.
For those interested in an alternative to Redis on disk, compatible with the Redis protocol, have a look on Kvrocks[1] which was recently accepted in the Apache foundation. It is based on RocksDB and works quite nicely for us.
Thanks for this, it seems to even support redis' lua functionality with is essential for us. I wish I hadn't decided to use redis stack since even kvrocks isn't compatible with its extensions
We use Kvrocks[0] at work. It is Redis on disk, powered by "RocksDB" (hence the name) and compatible with most of the Redis clients since it respects the Redis protocol. It was incubated by Apache earlier this year.
It works great and the development is really active.