Calling bullshit is maybe too harsh. There may be requirements matching the available training data and the right mood the LLM has been tuned for where it delivers acceptable, then considered flawless, results (extreme example: just try "create me hello world in language x" will mostly deliver flawless)... and by that amateurs (not judging, just mean less exposed to variety of problems and challenges) may end up with the feeling that LLMs could do it all.
But yes, any "serious" programmer working on harder problems can quickly derail an LLM and prove otherwise, with dozens of his simple problems each day (tried it *). It doesn't even need that, one can e.g. prove ChatGPT (also 4) quickly wrong and going in wrong circles on C++ language questions :D, though C++ is still hard, one can also do the same with questions on the not-ultracommon Python libs. It confidently outputs bullshit quick.
(*): Still can be helpful for templating, ideas, or getting into the direction or alternatives, no doubts on that!