Hacker News new | past | comments | ask | show | jobs | submit login

Thanks for sharing, I like the approach and it makes a lot of sense for the problem space. Especially using existing products vs building/hosting your own.

I was however tripped up by this sentence close to the beginning:

> we encountered a significant challenge with RAG: relying solely on vector search (even using both dense and sparse vectors) doesn’t always deliver satisfactory results for certain queries.

Not to be overly pedantic, but that's a problem with vector similarity, not RAG as a concept.

Although the author is clearly aware of that - I have had numerous conversations in the past few months alone of people essentially saying "RAG doesn't work because I use pg_vector (or whatever) and it never finds what I'm looking for" not realizing 1) it's not the only way to do RAG, and 2) there is often a fair difference between the embeddings and the vectorized query, and with awareness of why that is you can figure out how to fix it.

https://medium.com/@cdg2718/why-your-rag-doesnt-work-9755726... basically says everything I often say to people with RAG/vector search problems but again, seems like the assembled team has it handled :)




Author here: you're for sure right -- it's not a problem with RAG the theoretical concept. In fact, I think RAG implementations should likely be specific to their use cases (e.g. our hybrid search approach works well for customer support, but I'm not sure if it would work as well in other contexts, say for legal bots).

I've seen the whole gamut of RAG implementations as well, and the implementation, specifically prompting and the document search has a lot to do with the end quality.


re: legal, I saw a post on this idea where their RAG system was designed to return the actual text from the document rather than a LLM response or summary. The LLM played a role in turning the query into the search params, but the insight was that for certain kinds of documents, you want the actual source because of the existing, human written summary or the detailed nuances therein


Sounds more like Generation Augmented Retrieval in that case.


It wasn't this GAR post, I remember them calling out legal docs explicitly, might have seen it on Twitter

https://blog.luk.sh/rag-vs-gar


Do you happen to have any good references for GAR implementation?


> Not to be overly pedantic, but that's a problem with vector similarity, not RAG as a concept.

Vector similarity has a surprising failure mode. It only indexes explicit information, missing out the implicit one. For example "The second word of this phrase, decremented by one" is "first", do you think these strings will embed the same? Calculated results don't retrieve well. Also, deductions in general.

How about "I agree with what John said, but I'd rather apply Victor's solution"? It won't embed like the answer you seek. Multi-hop information seeking questions don't retrieve well.

The obvious fix is to pre-ingest all the RAG text into a LLM and calculate these deductions before embedding.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: