In my experience the QA with documents pattern is fairly straightforward to impl...

Weves · on July 10, 2023

Agree with the point about intelligent chunking being very important! Each individual app connector can choose how it wants to split each `document` into `section`s (important point: this is customized at an app-level). The default chunker then keeps each section as part of a single chunk as much as possible. The goal here is, as you mentioned, to give each chunk the relevant surrounding context.

Additionally, the indexing process is setup as a composable pipeline under the hood. It would be fairly trivial to plug in different chunkers for different sources as needed in the future.

darkteflon · on July 10, 2023

Chunking is very important but might, I feel, best be contextualised as one aspect of the bigger substantive challenge, which is how to prevent false negatives at the context retrieval stage - a.k.a. how to ensure your (vector? hybrid?) search returns all relevant context to the LLM’s context window.

Would you mind saying a few words on how Danswer approaches this?

andy99 · on July 10, 2023

Yes agreed, tooling abounds, the work for anyone who's serious about this is customizing everything so it works with the idiosyncrasies of the documents and questions a customer has. I'm happy to talk to anyone who is interested, we are doing something like this for a company now.