Funny that the Bob Ross version just makes them look like Bob Ross. Maybe there are more pictures of Bob Ross in the training set than his actual paintings.
While it is the case the dataset isn’t well curated or perfectly labeled, it could just mean that grammar is not understood - the labels could be clear to a human, whether the image is a picture of Bob Ross or a painting by him. But the training misses that relationship. Even with poorly labeled data, I suspect AI will eventually figure out which labels are more likely to be poor and deal with it appropriately.
In the reverse direction, you can try:
A horse rides an astronaut
And you will probably generate an astronaut riding a horse. It’s not a poor description of what we want; our assumptions about how grammar should work aren’t being honored.