More

noaflaherty · on March 7, 2023

Thanks so much for your interest and thoughtful questions! I'll do my best to answer here, but looking forward to digging in deeper with you offline, too :)

A. Documents are uploaded and organized via the UI at this moment, but later this week we will be exposing APIs to do the same programmatically. You'll be able to programmatically create a "Document Index" and then upload documents to a given index. In your case, you'd likely have one index per collection. We don't currently enforce a strict size limit, but it's likely we'll need to soon. In this case, you might break the document up into smaller documents prior to uploading.

B. Yes, the number of chunks returned can be specified as part of the API call when performing a search. Currently, the chunking strategy and max size is static, but we fully intend on making this configurable very soon.

C. Yes, we track which document and which index each chunk came from. With proper prompt engineering, you can have the LLM include this citation in the final response. We helped a customer of ours just recently construct a prompt that did this same thing! Saying where in the document it came from is a bit trickier (although you do know the text of the chunk that's most relevant, which is a helpful starting point).

D. We do not share data or prompts across our customers, although we do provide strategic advice that's informed by our experiences. We'd love to learn what guarantees you're looking for and feel confident we can work within most reasonable bounds. For what it's worth, my personal opinion is that companies should be cautious about banking on prompts as the primary point of defensibility for LLM-powered apps. Reverse prompt-engineering is a thing (interesting article here: https://lspace.swyx.io/p/reverse-prompt-eng). My take is that LLM defensibility will come from your data (e.g. data that powers your semantic search or training data used to fine-tune proprietary models), as this is much harder for competitors to recreate, not to mention the user experience and go-to-market that surrounds it all.

E. We haven't yet done formal benchmarking (although we're admittedly overdue for it!), but we have architected our inference endpoint with low-latency as a top priority. For example, we've selected cloud data centers close to where we understand OpenAI's to be and have minimized all blocking procedures such that we perform as much as we can asynchronously. We host this endpoint separately from the rest of our web application, have at least one instance running at all times (to prevent cold starts), and have it auto-scale as traffic demands.

sebastiennight · on March 7, 2023

Thanks for your reply!

A. OK, I like that, thanks

B. OK

C. Just to confirm - the chunks are verbatim right? So in theory I could just do a string search in the document to locate the chunk?

D. I would assume that you are currently not encrypting the data at rest. Encrypting it with the customer's API would probably result in a performance hit?

In any case, if you're not encrypting it at all, and in absence of any certification/assurances as to your data security practices then as a customer I'm forced to assume that giving you access to the data (and prompts) is tantamount to public disclosure of it. I mean, LastPass had their data stolen. You and us are both startups. Anything valuable that is not nailed to the floor (encrypted with utmost paranoia) is like leaving the chairs on the patio of a beach bar at night.

In which case there is no defensibility to be found there for us. It doesn't prevent us from becoming a customer, but it means we have to hedge our use cases so as not to put our own customers' data (e.g. trade secrets that might be included in their documents) at risk.

E. Great to know, thanks!

noaflaherty · on March 6, 2023

Thank you for the kind words :)

noaflaherty · on March 6, 2023

Thanks for your interest! We've toyed with the idea of making at least pieces of Vellum open source, but have decided against it for the time being. There are some great open source libraries like Langchain or GPT-Index that, while quite different, may satisfy some of your needs.

noaflaherty · on March 6, 2023

Appreciate the feedback! A comparison table is a great idea and something we'll look into.

We fully anticipate having tighter integrations with Langchain in the near future. We view them as complimentary frameworks in many ways. For example, we might subclass the `BaseLLM` class such that you can interact with Vellum deployments and get all the monitoring/observability that Vellum provides, but invoke them via your Lanchain chain.

noaflaherty · on March 6, 2023

Thanks! We totally agree that spot-checking won't scale long term. We're currently testing a feature in beta that allows you to provide an "expected output" and then choose from a variety of comparison metrics (e.g. exact match, semantic similarity, Levenshtein distance, etc.) to derive a quantitative measure of output quality. The jury's still out whether this is sufficient, but we're excited to continue pushing in this direction.

p.s. it's cool to hear from another company that's helping expand this market!

michaelmior · on March 7, 2023

What I think would be really interesting is to apply distance metric learning (DML) to the problem. You have users tell you what responses are good and bad and use that to learn a metric that will classify responses as good as bad. One of the big challenges is that DML is typically applied to data in some vector space as opposed to strings, but I would expect using some embedding constructed from the output could work well.

noaflaherty · on March 7, 2023

Super interesting idea! We already expose UIs and APIs for supplying feedback on the quality of the output, so this could totally be possible once enough feedback has been collected. Thanks for sharing

ajhai · on March 6, 2023

Letting users pick a comparison metric of their choice is a good option till something better comes along. Good luck with Vellum!

noaflaherty · on March 6, 2023

Thank you for flagging! We've come across them in prior searches, but interesting to learn how well-known they are

krebby · on March 7, 2023

It's also a major component of the 3d software Houdini, and is used for physics simulations around cloth, grains, hair, etc. https://www.sidefx.com/docs/houdini/vellum/index.html

cobaltoxide · on March 6, 2023

Yup I came here to say this too. It's a well-known product in the self-publishing world.

egotripper · on March 6, 2023

Likewise, it is a well-known app among writers.

noaflaherty · on March 6, 2023

Thanks for the question! Would you mind elaborating on what you mean by "optimization options?" We've helped a number of our customers fine tune models and optimize for increased quality, lower cost, or decreased latency (e.g. fine-tune curie to perform as well as regular davinci, but at a lower cost and latency).

We offer UIs and APIs for "feeding back actuals" and providing indications on the quality of the models output / what it should have output. This feedback loop is used to then periodically re-train fine-tuned models.

Hopefully this answers your question, but happy to respond with follow-ups if not!

Nowado · on March 6, 2023

I'm thinking about improving model response quality.

Training of preexisting LLM models that I'm familiar with consists of two aspects/sides/options: fine-tuning the model with additional, domain specific data (like internal company documentation) and RLHF (like comparing model responses to customer service actual responses) to further improve how well it's using that and original resources it has access to. That's how https://github.com/CarperAI sets up the process, for example.

What you're describing seems closer to the latter, but I'm not entirely sure if you're following the same structure at all.

siddseethepalli · on March 6, 2023

Hey, Sidd from Vellum here!

Right now we offer traditional fine tuning with prompt/completion pairs but not training a reward model. This works great for a lot of use cases including classification, extracting structured data, or responding with a very specific tone and style.

For making use of domain specific data we recommend using semantic search to pull in the correct context at runtime instead of trying to fine tune a model on the entire corpus of knowledge.

noaflaherty · on March 6, 2023

Hi yes, that's the idea! The example shown in the demo video uses internal help docs as the "source of knowledge" for embeddings, but the same principles apply to customer data.

lukasb · on March 6, 2023

Great! Would I be able to provide customers any guarantees about the privacy of their data? Could you create embeddings based on data encrypted homomorphically?

noaflaherty · on March 6, 2023

We'd love to learn more about what types of guarantees your customers expect – it's likely we can provide many of them now and will inevitably offer even more down the line. Feel free to reach out directly to noa@vellum.ai if you'd like to discuss!

Vellum currently embeds any text you send it, but to be honest, we haven't experimented with performing semantic search across homomorphically encrypted text and can't speak to its performance. If this becomes a recurring theme from our customers, we'd be excited to dig into it deeper!

lukasb · on March 6, 2023

Yeah I understand that operating on opaque data might not be one of the first items on your roadmap. Thanks for the quick responses.

noaflaherty · on March 6, 2023

Thank you and good question! If you're comfortable with the quality of OpenAI's embeddings, performing your own chunking, rolling your own integration with a vector db, and don't need Vellum's other features that surround the usage of those embeddings, then Vellum is probably not a good fit. Vellum's Search offering is most valuable to companies that want to be able to experiment with different embedding models, don't want to manage their own semantic search infra, and want a tight integration with how those embeddings are used downstream.

noaflaherty · on March 6, 2023

I appreciate your interest! We'll be reaching out soon :)