I’ve noticed something similar, though I don’t think it’s literally “time of day”
so much as changing system conditions.
My working theory is that under higher load, the model is more likely to:
- take broader interpretive leaps
- attempt larger refactors instead of minimal diffs
- “explain its way forward” after a wrong turn rather than reset cleanly
That shows up as rabbit holes and self-reinforcing iterations, especially on
codebases where local consistency matters more than global cleverness.
What’s helped a bit for me:
- explicitly asking for minimal, localized changes
- telling it not to refactor unless necessary
- breaking requests into smaller steps and locking earlier decisions
It could also be variance from routing, context window pressure, or subtle
prompt drift rather than a predictable nightly degradation, but the pattern
of “overconfident refactor spirals” feels real.
A like-for-like experiment with the same prompt and context at different times
would be interesting, though hard to fully control.
My working theory is that under higher load, the model is more likely to: - take broader interpretive leaps - attempt larger refactors instead of minimal diffs - “explain its way forward” after a wrong turn rather than reset cleanly
That shows up as rabbit holes and self-reinforcing iterations, especially on codebases where local consistency matters more than global cleverness.
What’s helped a bit for me: - explicitly asking for minimal, localized changes - telling it not to refactor unless necessary - breaking requests into smaller steps and locking earlier decisions
It could also be variance from routing, context window pressure, or subtle prompt drift rather than a predictable nightly degradation, but the pattern of “overconfident refactor spirals” feels real.
A like-for-like experiment with the same prompt and context at different times would be interesting, though hard to fully control.