First: Congratulations! I've just sent an early access request. Second - I've go...

noaflaherty · on March 7, 2023

Thanks so much for your interest and thoughtful questions! I'll do my best to answer here, but looking forward to digging in deeper with you offline, too :)

A. Documents are uploaded and organized via the UI at this moment, but later this week we will be exposing APIs to do the same programmatically. You'll be able to programmatically create a "Document Index" and then upload documents to a given index. In your case, you'd likely have one index per collection. We don't currently enforce a strict size limit, but it's likely we'll need to soon. In this case, you might break the document up into smaller documents prior to uploading.

B. Yes, the number of chunks returned can be specified as part of the API call when performing a search. Currently, the chunking strategy and max size is static, but we fully intend on making this configurable very soon.

C. Yes, we track which document and which index each chunk came from. With proper prompt engineering, you can have the LLM include this citation in the final response. We helped a customer of ours just recently construct a prompt that did this same thing! Saying where in the document it came from is a bit trickier (although you do know the text of the chunk that's most relevant, which is a helpful starting point).

D. We do not share data or prompts across our customers, although we do provide strategic advice that's informed by our experiences. We'd love to learn what guarantees you're looking for and feel confident we can work within most reasonable bounds. For what it's worth, my personal opinion is that companies should be cautious about banking on prompts as the primary point of defensibility for LLM-powered apps. Reverse prompt-engineering is a thing (interesting article here: https://lspace.swyx.io/p/reverse-prompt-eng). My take is that LLM defensibility will come from your data (e.g. data that powers your semantic search or training data used to fine-tune proprietary models), as this is much harder for competitors to recreate, not to mention the user experience and go-to-market that surrounds it all.

E. We haven't yet done formal benchmarking (although we're admittedly overdue for it!), but we have architected our inference endpoint with low-latency as a top priority. For example, we've selected cloud data centers close to where we understand OpenAI's to be and have minimized all blocking procedures such that we perform as much as we can asynchronously. We host this endpoint separately from the rest of our web application, have at least one instance running at all times (to prevent cold starts), and have it auto-scale as traffic demands.

sebastiennight · on March 7, 2023

Thanks for your reply!

A. OK, I like that, thanks

B. OK

C. Just to confirm - the chunks are verbatim right? So in theory I could just do a string search in the document to locate the chunk?

D. I would assume that you are currently not encrypting the data at rest. Encrypting it with the customer's API would probably result in a performance hit?

In any case, if you're not encrypting it at all, and in absence of any certification/assurances as to your data security practices then as a customer I'm forced to assume that giving you access to the data (and prompts) is tantamount to public disclosure of it. I mean, LastPass had their data stolen. You and us are both startups. Anything valuable that is not nailed to the floor (encrypted with utmost paranoia) is like leaving the chairs on the patio of a beach bar at night.

In which case there is no defensibility to be found there for us. It doesn't prevent us from becoming a customer, but it means we have to hedge our use cases so as not to put our own customers' data (e.g. trade secrets that might be included in their documents) at risk.

E. Great to know, thanks!