GPT 4 still does a lot of dumb stuff on this question, you see several people post outright wrong answer and say "Look how gpt-4 solved it!". That happens quite a lot in these discussions, so it seems like the magic to get gpt-4 to work is that you just don't check its answers properly.
I've had to work with imperfect machines a lot in my recent past. Just because sometimes it breaks, doesn't mean it's useless. But you do have to keep your eyes on the ball!
I think that's the crux of the whole argument. It's an imperfect (but useful) tool, which sometimes produces answers that make it seem like it can reason, but it clearly can't reason on its own in any meaningful way
There's a reason you see people walking around in hard hats and steel toed boots in some companies. It's not because everything works perfectly all the time!