Bring your own embeddings. PyTorch and TensorFlow packages are 2GB+ each (don't quote me on that), which is unnecessary if you're making a network call to your favorite embedding service.
For my use-case, I wanted whole-paragraph embeddings (iirc these are trained on max-256-token sentence pairs)
Their simple suggestion to extend to longer texts is to pool/avg sentence embeddings - but I'm not so sure that I want that; for instance eg that implies order between sentences didn't matter. If I were forced to use sentence transformers for my use case, then the real fix would be to train an actual pooling model atop of the sentence embedder, but I didn't want to do that either. At that point I stopped looking into it, but I'm certain there are newer models out nowadays that have both better encoders and handle much longer texts. The one nice thing about the sentence transformer models though is that they are much more lightweight than eg a 7B param language model
For English-language embeddings, the bge models from the Beijing Academy of Artificial intelligence outperform SBERT. There is a leaderboard somewhere on Hugging Face, but I can’t find it at the moment.
... Whatever you want? Even with a vector DB, embeddings are typically BYO to beginwith - dependencies not in your DB and computed in pipelines well before it. It's handy for small apps to do in DB and have support in DB for some queries, but as things get big, doing all in the DB gets weirder...
Edit: I see you run a vectordb company, so your question makes more sense
True but the comment discussed that it should remove "heavyweight libraries". Unless you use an API service, those libraries will need to be imported somewhere. It doesn't necessarily have to run on the same server as the database.
Vector embedding at the app tier & orchestration / compute tier make more sense for managing the dependency than the vector DB tier for the bulk of ML/AI projects I've worked on. Round tripping through the vectordb would be an architectural, code, and perf headache. Ex: Just one of the many ways we use embeddings is as part of prefiltering what goes into the DB, running in a bulk pipeline in a diff VPC, and in a way we dont want to interfere with DB utilization.
We generally avoid using embedding services to beginwith.. outside calls is the special case. Imagine something heavy like video transcription via Google APIs, not the typical one of 'just' text embedding. The actual embedding is generally one step of broader data wrangling, so there needs to be a good reason for doing something heavy and outside our control... Which has been rare.
Doing in the DB tier is nice for tiny projects, simplifying occasional business logic, etc, but generally it's not a big deal for us to run encode(str) when building a DB query.
Where DB embedding support gets more interesting to me here is layering on additional representation changes on top, like IVF+PQ... But that can be done after afaict? (And supporting raw vectors generically here, vs having to align our python & model deps to our DB's, is a big feature)