That will probably continue to be the approach indefinitely. There's going to be an increasingly advanced translation layer in-between the user prompt and the software responsible for producing the images. We've done this for pretty much all computing & software systems that people interface with. Stripping out the complexity on the front-end for the user is one key to how you get generative software to go super wide. To do that more of the complexity goes to the back-end.
https://twitter.com/madebyollin/status/1708204657708077294
https://media.discordapp.net/attachments/1023643945319792731...