Having used Claude Code extensively for the last few months, I still haven't reached this "until it isn't" point. Review the code that comes out. It goes a long way.
Yes, my point is that you don't even have "it compiles" as a way to measure a code review. Maybe you did a great job, maybe you did a terrible job, how do you tell?