A human could get a valid end state most of the time, gpt-4 seems to mess up more than it got it right based on the examples posted here. So to me it seems like gpt-4 is worse than humans.
Gpt-4 with help from a competent human will of course do better than most humans, but that isn't what we are discussing.
I disagree. Don't assume "most humans" are anything like Silicon Valley startup developers. Most developers out there in the wild would definitely struggle to solve problems like this.
For example, a common criticism of AI-generated code is the risk of introducing vulnerabilities.
I just sat in a meeting for an hour, literally begging several developers to stop writing code vulnerable to SQL injection! They just couldn't understand what I was even talking about. They kept trying to use various ineffective hacky workarounds ("silver bullets") because they just didn't grok the the problem.
Gpt-4 with help from a competent human will of course do better than most humans, but that isn't what we are discussing.