I've proof of concept-ed the same workflow and it just doesn't hold up in practice. Even so, there are some benefits that truly do help a solo dev.
It's actually easier to de-bake yourself than use SD-derived assets. What I mean is that img2depth is severely lacking at the moment. The geometry it produces it unsuitable for compositing with other game assets. You'll notice compositing was avoided in the demo.
As such it's easier to just block out geometry in Blender, and either project image as textures to it or repaint them by hand using diffusion output as a reference (or both if you're into that kind of thing). Even so, that's still incredibly useful.
One of the most common things holding back a solo dev are art assets. I might not be able to create concept art, models, and textures ex novo, but I can pretty easily block out some geometry, repaint textures, and throw in some basic lighting.
Looking forward, ideally img2depth+normal+diffuse+lighting would be a thing, but until that's something it's easier to take SD output as advice and suggestion than tangible assets and that's still worth a lot.
CN is a total game changer for generative image models, it solves so many things that were problematic before (proper depth, pose, sensible text and much more). This along with LoRa[0] and other improvements from the SD community really turn this into a super capable toolchain.
ControlNet is where it's at - the depth model can completely reskin an image [1] with a bit of prompting, and you can compose them (multiple controlnets feeding on each other)
This example perfectly encapsulates the state of generative AI art.
You can click a few buttons and instantly get artwork that is indistinguishable from stuff made by professional artists for top-quality games. Amazing!
The downside is, your game is indistinguishable from every other game. Close your eyes and pick out anything in the isometric sci-fi/cyberpunk genre from Steam and the result will look exactly like this one. Heck you can probably import it directly from Unity's asset store without even bothering with Stable Diffusion.
There's the broader question of – what is AI adding that will appeal to people wanting to buy your game, and the answer to that is still probably nothing.
What? I got the exact opposite impression. What I took away is that you can press the button and you will get something to look at, but the moment you actually need a seamless and coherent artist driven scene a lot of the work will still need human artists who are now merely painting with a very complex brush.
Your comment sticks with me because the art generated here does look very ai generated (the lack of any consistency between something basic like a building window sticks out) and I disagree it’s indistinguishable from professional artists.
That said it reminds me quite stingingly of the work of manga artist Tsutomu Nihe whose works is quite literally about characters going through infinitely large Lovecraft-esq superstructures generated by machines
I feel like asking "What is AI adding that will appeal to people wanting to buy your game" can be said about a majority of games with or without AI. It's possibly also the wrong question, AI isn't necessarily adding anything special but it IS enabling much more powerful tooling for the creation of games.
The prototyping power and getting a general feel for how you'd like your design to be is only going to get stronger, utilizing these tools as a solo dev is going to be (and already is) such a gamechanger.
People like OP are also putting in the extra work that I'd argue is significantly more than just pressing some buttons and getting some results, though I do concede the next era of low effort asset-flip style of games is going to flood the market soon enough.
My concern is we’re going to see the loss of creativity akin to what we see when directors us existing film scores as “dummy scores” to help establish mood.
> when directors us existing film scores as “dummy scores” to help establish mood.
Which then means that when the actual score for that film needs to be written, it pretty much has to sound like the dummy score, because all the scenes were filmed to match the dummy score.
> that is indistinguishable from stuff made by professional artists
Er - no, absolutely not. The moment I found myself in front of the screenshots I immediately noticed the very heavy patterns of NN-based generated graphic. Chiefly the distinctively smudged shapes - like 3D Google Maps as opposed to their earlier photographic aerial view.
> indistinguishable for the vast majority of people
It would be a very worrisome indicator if more than a minority of people were unable to detect monstruosities - one eye above one below, one smaller one bigger, one normal one vacuous, one pupil round one merging with the eyelashes...
The poster wrote «professional artists», and they are supposed to produce Quality. That thing that, when spread, helps educated perception recognize lower quality and odd outputs.
> The downside is, your game is indistinguishable from every other game
There is not such a consistency in "every other game", neither is there only a single one AI style, so that can't happen either, you'll be able to distinguish if the designer wanted such a distinction
> what is AI adding that will appeal to people wanting to buy your game, and the answer to that is still probably nothing.
This one is also easy: price reduction, that's pretty appealing
Like I said, you can get an entire game world (maps, artwork, backgrounds, character sprites, weapons, vehicles, animation, sound effects, music) by clicking a few buttons in the Unity asset library. The barrier to entry for making a generic cookie-cutter game that a thousand people have made before is already nonexistent.
> Heck you can probably import it directly from Unity's asset store without even bothering with Stable Diffusion.
Try actually doing this though and it'll look like an absolute mess. You'll end up without any sort of cohesive art style and you'll have random variations in quality level. The thing about this AI generated stuff is that what it lacks in 'peak talent' it more than makes up for it in being able to actually produce something cohesive.
> You'll end up without any sort of cohesive art style and you'll have random variations in quality level.
Oh look: exactly the result with AI tools.
Did you miss the point where the author went through multiple variations, had to construct intricate prompts and do quite a lot of manual work for the end result to be half-decent?
No you can't. They'll be from different people and use mutually incompatible shaders. And in the case of unity, mutually incompatible renderer plugin versions
..
Agreed. The power of SD is actually in its diverse styles. You should be able to aim at any specific style in latent space and get it. This is in contrast with Midjourney, which heavily favors sci-fi modern style.
With either tech, this thread is a very cynical take. If you give generic prompts, you get generic results. Want something personal, get personal and you will get personal results out of the AI.
The barrier to creation has been lowered. But, when you can create anything, you still have to dig to figure out what you actually want.
Prototyping and concept arting for game assets using stable diffusion has been a very enjoyable process for me. Seeing it being utilized more for isometric is pretty exciting with the sheer amount of iteration you immediately get.
Using img2img with your own rough drafts and seeing what the AI can come up with has knocked hours/days off of my process, couldn't be happier with these immensely powerful tools. Random aside, but AI is very good at making tile-able textures too, it's been great.
The video in the blog looks very solid in motion, will be interesting to see as you develop better shader techniques on the pieces.
I think this series might need to be updated with the advent of controlnet. Everything from pose through normal maps can be lifted from existing assets and applied as needed. Can't imagine how much more productive the author will be incorporating these major improvements.
I think it looks great as a proof-of-concept. The author also correctly identifies some of the challenges, of which I think the greatest will be consistency. I would explore picking out the best examples it generates, or a style from a game you want to mimic, and train a Dreambooth model on those.
Thank you to everyone who still works on isometric and 2D games. I hated it when my favorite titles like Civ went to cheesy 3D interfaces with assets that are indistinguishable until you're zoomed in so close its useless as a map.
Based on the skeptical comments I was expecting to be disappointed but this is really incredible! Great write-up, too. Thanks for sharing.
This approach may not scale or deliver "good enough" results or whatever right now, but it blows my mind that one person can do everything outlined in this and the follow up post on characters. Incredible efficiency gains and this technology came out less than a year ago!
I think animation is a bigger problem. If you have 2D artwork then creating reasonable 3D models off of that is something that you can reasonably learn to do. But animating that model is much harder (or learning to).
It's literally a greybox, with minor surface detail/texture. Not the most impressive demonstration. Any level designer could whip that up in a single afternoon...
More like yw game artists. Check out the sketches the author made in order to generate the character model. That's a huge time saver if a game artist can just do a few rough sketches and then get a ton of reference or raw material to work with. The skillset for artists will change a bit, they'll focus more on compositing and repainting but they won't be out of a job. It'll make them a lot more productive in the end.
As long as there is enough randomness to generate novel features, and there are still humans in the loop to select the ones with desirable mutant features to survive and guide the next-generation model's evolution... pretty far.
Yes. Not only without giving them credit, but for the most part making them unemployable or reducing the value of their expertise to subsistence level.
The world is full of people who have been rendered unemployed or underemployed due to globalization and automation. Not only has it happened, it's happening now, and it will happen to knowledge workers and creatives.
The increased demand for the work AI can produce will create jobs filled by AI, so Jevon's paradox doesn't guarantee anything. If it did, cities like Detroit wouldn't be ruins now, as the boom of outsourcing and automation in manufacturing would simply have created more jobs as demand increased. But it turns out the market isn't an infinitely frictionless superfluid and the incentives for businesses change - when they can create value without human labor at all, they will seek to do so in all possible cases.
StableDiffusion is almost entirely not trained on "art" images. I daresay you could easily produce art assets from a model trained entirely on non-artistic images.
Try following the other steps listed in the article. If you don't already have those skills you'll be spending a lot of time learning and mastering the tools and techniques. Even the SD generation steps require a decent knowledge of SD and then a lot of grind while you generate dozens or hundreds of images and tweak parameters until you get something good.
It's actually easier to de-bake yourself than use SD-derived assets. What I mean is that img2depth is severely lacking at the moment. The geometry it produces it unsuitable for compositing with other game assets. You'll notice compositing was avoided in the demo.
As such it's easier to just block out geometry in Blender, and either project image as textures to it or repaint them by hand using diffusion output as a reference (or both if you're into that kind of thing). Even so, that's still incredibly useful.
One of the most common things holding back a solo dev are art assets. I might not be able to create concept art, models, and textures ex novo, but I can pretty easily block out some geometry, repaint textures, and throw in some basic lighting.
Looking forward, ideally img2depth+normal+diffuse+lighting would be a thing, but until that's something it's easier to take SD output as advice and suggestion than tangible assets and that's still worth a lot.