That is neat, but chatgpt, Gemini, etc. are not even remotely close to being in the same league as models dedicated to image (and video) generation like stable diffusion or Black Forest Labs FLUX, etc. I don't think they ever will be either.
GPT 4o is the most impressive image model in the world, and it represents a full step function change in our capabilities. Images are over now.
Flux, fine tuning, inpainting/outpainting, and ComfyUI are effectively dead. I can show 4o a scribble and it does all the editing for me. Comfy is a total hack compared to that. It's not necessary in the new world.
Every image model and image product that has raised capital is effectively worthless / back at ground zero. They're all inferior to this.
Civitai, Leonardo, OpenArt, Invoke - this is an extinction event for them. They're all worthless products now.
I expect the same to happen to video soon.
If an open weights version of 4o comes out, then I really don't think there will be a product moat for anyone in media.
So far from dead. You should look more closely at sites like civitai. These models lead the way in image and video generation. Chatgpt will never come close to touching these.
To be frank, OpenAI is probably nearing its death too. Lost all its talent. Over valued, over raised. Can't deliver on its promises now either. Everyone else doing better for cheaper.
So I've been experimenting with GenAI around images since basically SD 1.5 so I do speak with at least some level of experience.
I host/run the full FLUX.1-dev model (along with various checkpoint merges such as STOIQO/Chroma/Pixelwave) daily but 4o's multimodal image generation is LEAGUES ahead of Flux in terms of prompt adherence.
I put together a website that compares actually complex prompts across all the SOTA models (Imagen3, Flux, MJ7, and 4o).