opus 4.1: made weird choices, eventually got to a meh solution i just rolled back.
codex: took a disgusting amount of time but the result was vastly superior to opus. night and day superiority. output was still not what i wanted.
sonnet 4.5: not clearly better than opus. categorically worse decision-making than codex. very fast.
Codex was night and day the best. Codex scares me, Claude feels like a useful tool.
reply
Agreed. If these same models were used on a different codebase/language etc. it will likely produce very different results.
opus 4.1: made weird choices, eventually got to a meh solution i just rolled back.
codex: took a disgusting amount of time but the result was vastly superior to opus. night and day superiority. output was still not what i wanted.
sonnet 4.5: not clearly better than opus. categorically worse decision-making than codex. very fast.
Codex was night and day the best. Codex scares me, Claude feels like a useful tool.