in my experience this highly depends case by case. For some cases Gemini crushed...

in my experience this highly depends case by case. For some cases Gemini crushed my problem, but in next one stuck and couldn't figure out simple bug.

the same with o3 and sonnet (I didn't tested 4.0 much yet to have opinion)

I feel thet we need better parallel evaluation support. where u could evaluate all top models and decide with one provided best solution