What I meant was that at the time of indexing, you can add more information to any chunk. This[1] is a simple example by Anthropic where they add more relevant context. In our case, say you have two models, D1 and D2. At the time of creating a vector store, you can add which model is more suitable to a chunk, so that when you retrieve it, you use the same model for inference. This is custom built, very dependent on datasets, but would get you to the functionality described. I suggest this approach when there are linkages between various docs (eg: financial statements/earning calls etc.).
Thanks... I also have another lingering doubt about the ability of RAG to make sense of "history", i.e. how to make sure that a more recent document on a given topic has more "weight" than older documents on the same issue.
This is done at a reranking step. It's again custom. You have two variables - 1/ relevance (which most algos focus on) 2/ Date. Create a new score based on some combination of weights for relevance and date. Eg; Could be 50% of date. If the document has 70% relevance, but was published yesterday, it's overall score would be 85%. (A conceptual idea). This is similar to how you do weighted sorting anywhere.
I think he might be saying, have metadata in your vector retrieval that describe the domain of the retrieved chunk and use that as a decision on which model to use downstream. Sounds like very interesting improvement of RAG
Could you please provide some more info (or maybe links) about this, please?