That sounds like a great increase in productivity.
But also you're making the mistake of extrapolating against the realities of the techniques.
Things may improve over time but prompts and random seeds aren't great for detailed work, so there are limitations which seriously limit the usefulness. "Everyone will be able to make it" is likely true, but the specialist stuff will likely remain and those users will likely be made more productive. It's those in the middle that will lose out.
That an industry is destroyed is neither here nor there. Sucks to have your business/job taken away but that's how the system works. That which created your business also will destroy it.
Have you played with control net over comfyui? Try it. You can pose arbitrary figures. Theres gonna be full kits that provide control over every aspect of generation.
I give it 12 months until the Pharmaceutical industry starts using this in a significant way. Currently, most Pharma ads on TV look like stock footage of random people doing random things with text and voice-over. So AI-generated? Sure, as if people are even watching the video action in any detail at all in Pharma ads. AI gen video companies that focuses on pharma will rake it in for sure in the short term.
[video prompt: Two elderly people taking a stroll on a boardwalk, partaking in various boardwalk activities.] [AI gen voice: Suffering from chronic blorgoriopsy? Try Neuvoplaxadip by Excelon pharamceuticals. Reported side effects include... Ask your doctor.]
What? The industry already exists. There's clearly money there. The idea that you can't have an industry just because it's specific to the richest country on earth is silly.
OK, let's see it make full-sized videos first; making tiny demo videos is a long way from showing it at 4K. Also, let's see the entire paper and note how many computing resources were required to build the models. Until everyone can try it for themselves, we have no idea how cherry-picked the examples were.
TV ads are short. 20 seconds of HD could be enough, easily upscaled to 4K.
I think it might be within the realm of the possible to see 30 second videos at the end of the year.
The next step could then be infinitely long videos when frames are getting generated at 24 fps, as long as the ability is given that they are able to stick to a story and a visual style that makes sense. The story could evolve automatically from an LLM or be generated in real time by an artist, like a prompt every minute. In any case, we're not that far away from this, even if the first results will be more like trippy videos.
Yep, and my cynical side is just hoping that the GPU vendors aren't going to deliberately limit the number of user-accessible resources there are to force people to depend on their cloud platforms.
I just want to feed an LLM hunter x hunter episodes and get out new ones.
But on a more serious note, I vividly remember when GANs were the next big thing when I was in university and the output quality and variability was laughable compared to what midjourney and the likes can produce today (my mind was still blown back then). So I would be in no way suprised if we got to a point in the next decade where we have "midjourney" for video generation. So I wholeheartedly agree.
I also think the computational problem is tackled from so many angles in the field of ML. You have nvidia releasing absolute beasts of GPUs, some promising start ups pushing for specialized hardware, a new paper on more optimized training methods every week, mamba bursting on the scene, higher quality data sets, merging of models, framework optimizations here and there. Just the other day I think I saw a post here about locally running larger LLMs. Stable Diffusion is already available for iPhones at acceptable qualities and speed (given the devices power).
What I wonder about the most though is whether we will get more robust orchestration of different models or multi modal models. It's one thing to have a model which given a text prompt generates a short video snippet. But what if I instruct my model(s) to come up with a new ad for a sports drink and they/it does research, consolidates relevant data about the target group, comes up with a proper script for an ad, creates the ad, figures out an evaluation strategy for the ad, applies it and eventually gives me back a "well thought out" video. And all I had to do was provide a little bit of an intro and then let the thing do its magic for an hour. I know we have lang chain and baby AGI but they are not as robust as they would need to be to displace a bunch of jobs just yet (but I assume they will soon enough).
I give it 5 years until it has been normalized to see AI generated TV/YT ads, 10 to 15 when traditionally made ones will be in the minority.
In the beginning just a bunch of geeks in front of computers crafting the prompts, later everyone will be able to make it.
It will probably be access to computing resources which will be the limiting factor.