Use agents to validate the code. Is it over engineered, does it conform to conventions and spec, is it actually implemented or half bullshit. I run three of these at the end of a feature or task and it almost always send Opus back to the workbench fixing a bunch of stuff. And since they have their own context, you don't blow up the main context and can go for longer.