Thanks. >*That plus babysitting Claude Code's context is annoying as hell.* It's...

CuriouslyC · 2025-08-25T22:30:31 1756161031

You can, but my tool actually handles the raw chat context. So you can have millions of tokens in context, and actual message that gets produced for the LLM is an optimized distillate, re-ordered to take into account LLM memory patterns. RAG tools are mostly optimized for QA anyhow, which has dubious carryover to coding tasks.

olejorgenb · 2025-08-26T15:24:41 1756221881

> ... re-ordered to take into account LLM memory patterns.

If I understand you correctly, doesn't this break prefix KV caching?

CuriouslyC · 2025-08-26T15:40:16 1756222816

It is done at immediately before the LLM call, transforming the message history for the API call.

This does reduce the context cache hit rate a bit, but I'm cache aware so I try to avoid repacking the early parts if I can help it. The tradeoff is 100% worth it though.

psadri · 2025-08-27T15:47:40 1756309660

I’m curious about this project (I’m working on something similar). Anyway to get in contact with you?

CuriouslyC · 2025-08-27T23:43:32 1756338212

you can click my spam protected email links on https://sibylline.dev, those should be working now. Any CTA will get me.