SDXL is great but it's in no way better than DALL E as far as straight text-to-image goes apart from the lack of censorship.
It has plenty of other advantages, but you can't tell it "make me a cute illustration of a 2 year old girl with Blaze from Blaze and the Monster Machines on a birthday cake with a large 2 candle on it."
DALL E will nail that, more or less. SDXL very much won't.
I used that as an example as I recently asked for it. I did find I had to tell it that "monster" in the title referred to monster trucks, not actual monsters. That helped it not put actual monsters in (as yours are half Blaze/half monsters), though my generations were way better at doing Blaze than yours were - they just had cute little monsters around too.
I also recommend a good photorealistic base model, like RealVis XL.
In my experience its like DALL E but straight up better, more customizable, and local. And thats before you start trying finetunes and LORAs.
Other UIs will do SDXL, but every one I tried is terrible without all those default fooocus augmentations.