For what it's worth, that was exactly my experience with GPT-3.5, but GPT-4 is a lot better at generating code. Almost spookily good, at least for some languages. It makes far fewer mistakes.
Maybe the ChatGPT implementation of GPT-4 is different than the one in Bing AI, but I tried to ask Bing AI to write a fairly simple Python-based ini-parser yesterday (and by that I really mean using the built-in configparser module), and while it got a good amount of the way there, but attempted to index a string with a string-key, which was weird. After multiple notices of this mistake, it produced something that _could_ work in some cases, but was definitely brittle.
> Maybe the ChatGPT implementation of GPT-4 is different than the one in Bing AI
Yeah I think it definitely is, but I don't know why. Bing is better at looking things up (perhaps unsurprisingly) but Chat4 is better at creating things.
It is but they don't have to be exactly the same. Bing might be tuned for searching real time information and maybe cost less since at search engine scale is much higher (just a guess on my part).