1. Once you use the vector embeddings to grab the most relevant chunks, are you just injecting the actual 400 (in this example) token prose text snippets into the LLM query? So under the hood does that query from the article end up as something like "Who was Benito Mussolini? Please use the following texts to inform your answer: [snippet 1, snippet 2, snippet3]"?
2. I understand the use case for knowledge that isn't cooked into the LLM because it's too recent etc., but I wonder about using it with historic knowledge. I assume all(?) LLMs would have used Wikipedia for training and would therefore already have this Mussolini information from those same articles, so what's the point of priming it with duplicate "external" information? Would that really improve accuracy?
1. Once you use the vector embeddings to grab the most relevant chunks, are you just injecting the actual 400 (in this example) token prose text snippets into the LLM query? So under the hood does that query from the article end up as something like "Who was Benito Mussolini? Please use the following texts to inform your answer: [snippet 1, snippet 2, snippet3]"?
2. I understand the use case for knowledge that isn't cooked into the LLM because it's too recent etc., but I wonder about using it with historic knowledge. I assume all(?) LLMs would have used Wikipedia for training and would therefore already have this Mussolini information from those same articles, so what's the point of priming it with duplicate "external" information? Would that really improve accuracy?