> If you ask Claude to fix a test that accidentally says assert(1 + 1 === 3), it...

> If you ask Claude to fix a test that accidentally says assert(1 + 1 === 3), it'll say "this is clearly a typo" and just rewrite the test. Codex will rewrite the entire V8 engine to break arithmetic.

Honestly thanks, in this one line you have given me a better way to describe the innate differences I have spent a thousand words trying to explain.

Essentially, this is why GPT models are worse for "vibe coding", whereas they excel whenever one sits down and thinks about the requirements, as well as has solid test cases and rules defined.