>That plus babysitting Claude Code's context is annoying as hell.
It's crazy to me that—last I checked—its context strategy was basically tool use of ls and cat. Despite the breathtaking amount of engineering resources major AI companies have, they're eschewing dense RAG setups for dirt simple tool calls.
To their credit it was good enough to fuel Claude Code's spectacular success, and is fine for most use cases, but it really sucks not having proper RAG when you need it.
On the bright side, now that MCP has taken off I imagine one can just provide their preferred RAG setup as a tool call.
You can, but my tool actually handles the raw chat context. So you can have millions of tokens in context, and actual message that gets produced for the LLM is an optimized distillate, re-ordered to take into account LLM memory patterns. RAG tools are mostly optimized for QA anyhow, which has dubious carryover to coding tasks.
It is done at immediately before the LLM call, transforming the message history for the API call.
This does reduce the context cache hit rate a bit, but I'm cache aware so I try to avoid repacking the early parts if I can help it. The tradeoff is 100% worth it though.
>That plus babysitting Claude Code's context is annoying as hell.
It's crazy to me that—last I checked—its context strategy was basically tool use of ls and cat. Despite the breathtaking amount of engineering resources major AI companies have, they're eschewing dense RAG setups for dirt simple tool calls.
To their credit it was good enough to fuel Claude Code's spectacular success, and is fine for most use cases, but it really sucks not having proper RAG when you need it.
On the bright side, now that MCP has taken off I imagine one can just provide their preferred RAG setup as a tool call.