Thanks for sharing, I like the approach and it makes a lot of sense for the prob...

johnjwang · 2024-05-30T16:12:06.000000Z

Author here: you're for sure right -- it's not a problem with RAG the theoretical concept. In fact, I think RAG implementations should likely be specific to their use cases (e.g. our hybrid search approach works well for customer support, but I'm not sure if it would work as well in other contexts, say for legal bots).

I've seen the whole gamut of RAG implementations as well, and the implementation, specifically prompting and the document search has a lot to do with the end quality.

verdverm · 2024-05-30T17:21:38.000000Z

re: legal, I saw a post on this idea where their RAG system was designed to return the actual text from the document rather than a LLM response or summary. The LLM played a role in turning the query into the search params, but the insight was that for certain kinds of documents, you want the actual source because of the existing, human written summary or the detailed nuances therein

gradys · 2024-05-30T21:59:35.000000Z

Sounds more like Generation Augmented Retrieval in that case.

verdverm · 2024-05-31T06:57:33.000000Z

It wasn't this GAR post, I remember them calling out legal docs explicitly, might have seen it on Twitter

https://blog.luk.sh/rag-vs-gar

rjvs · 2024-05-31T09:41:35.000000Z

Do you happen to have any good references for GAR implementation?

visarga · 2024-05-31T03:47:08.000000Z

> Not to be overly pedantic, but that's a problem with vector similarity, not RAG as a concept.

Vector similarity has a surprising failure mode. It only indexes explicit information, missing out the implicit one. For example "The second word of this phrase, decremented by one" is "first", do you think these strings will embed the same? Calculated results don't retrieve well. Also, deductions in general.

How about "I agree with what John said, but I'd rather apply Victor's solution"? It won't embed like the answer you seek. Multi-hop information seeking questions don't retrieve well.

The obvious fix is to pre-ingest all the RAG text into a LLM and calculate these deductions before embedding.