Mashups is the wrong way to think about it. It's generalizing at a higher level than texture / image sampling and it can tween things in latent space to get to visual spaces that haven't been explored by human artists before.
It requires a good steer and prompting is a clumsy tool for fine tuning - it's adequate for initialization but we lack words for every shade of meaning, and phrase weighting is pretty clumsy too, because words have a blend of meaning.
"It's generalizing at a higher level than texture / image sampling and it can tween things in latent space to get to visual spaces that haven't been explored by human artists before."
The very fact that the model is interpolating between things in the latent space probably explains why its images haven't been explored by human artists before: because there is a disconnect between the latent space of the model and genuine "latent space" of human artistic endeavor, which is an interplay between the laws of physics and the aesthetic interests of humans. I think these models know very little about either of those things and thus generate some pretty interesting novelty.
I think of artistic endeavour as a bit like the inverse of txt2img, but running in your head, and just projecting to the internal latent space, not all the way to words. It's not just aesthetic, it's about triggering feelings through senses. Images need to connect with the audience through associations with scenes, events, moods and so on from the audience members' lives.
Aesthetic choices like colour and shapes and composition combine with literal representations, facial emotions, symbolic meanings and so on. AI art so far feels quite shallow by this metric, usually only hitting a couple of notes. But sometimes it can play those couple of notes very sweetly.
It requires a good steer and prompting is a clumsy tool for fine tuning - it's adequate for initialization but we lack words for every shade of meaning, and phrase weighting is pretty clumsy too, because words have a blend of meaning.