I can see how this has the potential to disrupt the games industry. If you work on a AAA title, there is a small army of artists making 19 different types of leather armor. Or 87 images of car hubcaps.
Using something like this could really help automate or at least kickstart the more mundane parts of content creation. (At least when you are using high resolution, true color imagery.)
Yeah, there's a lot of 2D assets that this model would be great for (textures, materials, *maps, etc) that would definitely improve the asset-building process for game devs. I've already used VQGAN+CLIP for some low-res skill and item icons in hobby games and it seems things are only improving from here.
I wouldn't be surprised to see a comparable version for 3D models in the next year or two, though. Even if the current architecture doesn't lend itself to 3D structures (I don't know), there's a lot of parallel work being done right now (esp. by Google) for encoding 3D data in new/efficient ways, translating specialized 2D images into 3D models, and more.
Using something like this could really help automate or at least kickstart the more mundane parts of content creation. (At least when you are using high resolution, true color imagery.)