Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Agents __SHOULD NOT__ verify their own code. They know they wrote it, and they act biased. You should have a separate agent with instructions to red team the hell out of a commit, be strict, but not nitpick/bikeshed, and you should actually run multiple review agents with slightly different areas of focus since if you try to run one agent for everything it'll miss lots of stuff. A panel of security, performance, business correctness and architecture/elegance agents (armed with a good covering set of code context + the diff) will harden a PR very quickly.


Codex uses this principle - /review runs in a subthread, does not see previous context, only git diff. This is what I am using. Or I open Cursor to review code written by GPT-5 using Sonnet.


Do you have examples of this working, or any best practices on how to orchestrate it efficiently? It sounds like the right thing to do, but it doesn't seem like the tech is quite to the point where this could work in practice yet, unless I missed it. I imagine multiple agents would churn through too many tokens and have a hard time coming to a consensus.


I've been doing this with Gemini 2.5 for about 6 months now. It works quite well, it doesn't catch big architectural 100% but it's very good at line/module level logic issues and anti-patterns.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: