You can have a look at Langroid[1], a multi-agent LLM framework from ex-CMU/UW-Madison researchers, in production-use at companies (some have publicly endorsed us). RAG is just one of the features, and we have a clean, transparent implementation in a single file, intended for clarity and extensibility. It has some state of the art retrieval techniques, and can be easily extended to add others. In the DocChatAgent the top level method for RAG is answer_from_docs , here's the rough pseudocode:
answer_from_docs(query):
extracts = get_relevant_extracts(query):
passages = get_relevant_chunks(query):
p1 = get_semantic_search_results(query) # semantic/dense retrieval + learned sparse
p2 = get_similar_chunks_bm25(query) # lexical/sparse
p3 = get_fuzzy_matches(query) # lexical/sparse
p = rerank(p1 + p2 + p3) # rerank for lost-in-middle, diversity, relevance
return p
# use LLM to get verbatim relevant portions of passages if any
extracts = get_verbatim_extracts(passages)
return extracts
# use LLM to get final answer from query augmented with extracts
return get_summary_answer(query, extracts)