If this isn't the bell ringing, announcing that an entire industry will soon col...

dkjaudyeqooe · on Jan 24, 2024

That sounds like a great increase in productivity.

But also you're making the mistake of extrapolating against the realities of the techniques.

Things may improve over time but prompts and random seeds aren't great for detailed work, so there are limitations which seriously limit the usefulness. "Everyone will be able to make it" is likely true, but the specialist stuff will likely remain and those users will likely be made more productive. It's those in the middle that will lose out.

That an industry is destroyed is neither here nor there. Sucks to have your business/job taken away but that's how the system works. That which created your business also will destroy it.

wegfawefgawefg · on Jan 24, 2024

Have you played with control net over comfyui? Try it. You can pose arbitrary figures. Theres gonna be full kits that provide control over every aspect of generation.

rexreed · on Jan 24, 2024

I give it 12 months until the Pharmaceutical industry starts using this in a significant way. Currently, most Pharma ads on TV look like stock footage of random people doing random things with text and voice-over. So AI-generated? Sure, as if people are even watching the video action in any detail at all in Pharma ads. AI gen video companies that focuses on pharma will rake it in for sure in the short term.

[video prompt: Two elderly people taking a stroll on a boardwalk, partaking in various boardwalk activities.] [AI gen voice: Suffering from chronic blorgoriopsy? Try Neuvoplaxadip by Excelon pharamceuticals. Reported side effects include... Ask your doctor.]

Filligree · on Jan 24, 2024

Well, that would be a US only thing. I don't think you can build an industry on that.

Xirgil · on Jan 24, 2024

What? The industry already exists. There's clearly money there. The idea that you can't have an industry just because it's specific to the richest country on earth is silly.

kranke155 · on Jan 24, 2024

It is the bell ringing. I work in CGI for advertising and this is clearly going the way of still genAI.

Single image genAI went from unusable to indistinguishable from reality in 18-24 months.

coldcode · on Jan 24, 2024

OK, let's see it make full-sized videos first; making tiny demo videos is a long way from showing it at 4K. Also, let's see the entire paper and note how many computing resources were required to build the models. Until everyone can try it for themselves, we have no idea how cherry-picked the examples were.

qwertox · on Jan 25, 2024

TV ads are short. 20 seconds of HD could be enough, easily upscaled to 4K.

I think it might be within the realm of the possible to see 30 second videos at the end of the year.

The next step could then be infinitely long videos when frames are getting generated at 24 fps, as long as the ability is given that they are able to stick to a story and a visual style that makes sense. The story could evolve automatically from an LLM or be generated in real time by an artist, like a prompt every minute. In any case, we're not that far away from this, even if the first results will be more like trippy videos.

wegfawefgawefg · on Jan 24, 2024

I already have seen ai scenes in tv ads and anime. half the youtube thumbnails i see sre ai now. So.. might not even be five years. might be 2.

tetris11 · on Jan 24, 2024

Yep, and my cynical side is just hoping that the GPU vendors aren't going to deliberately limit the number of user-accessible resources there are to force people to depend on their cloud platforms.

__loam · on Jan 24, 2024

There's probably going to be more and more specialized hardware for this stuff. Things like H100s are already pretty inaccessible to consumers.

kouru225 · on Jan 24, 2024

I think you’re overestimating how useful this is. Just like image AI, this stuff will only be useful in combination with existing techniques.

Escapado · on Jan 24, 2024

I just want to feed an LLM hunter x hunter episodes and get out new ones.

But on a more serious note, I vividly remember when GANs were the next big thing when I was in university and the output quality and variability was laughable compared to what midjourney and the likes can produce today (my mind was still blown back then). So I would be in no way suprised if we got to a point in the next decade where we have "midjourney" for video generation. So I wholeheartedly agree.

I also think the computational problem is tackled from so many angles in the field of ML. You have nvidia releasing absolute beasts of GPUs, some promising start ups pushing for specialized hardware, a new paper on more optimized training methods every week, mamba bursting on the scene, higher quality data sets, merging of models, framework optimizations here and there. Just the other day I think I saw a post here about locally running larger LLMs. Stable Diffusion is already available for iPhones at acceptable qualities and speed (given the devices power).

What I wonder about the most though is whether we will get more robust orchestration of different models or multi modal models. It's one thing to have a model which given a text prompt generates a short video snippet. But what if I instruct my model(s) to come up with a new ad for a sports drink and they/it does research, consolidates relevant data about the target group, comes up with a proper script for an ad, creates the ad, figures out an evaluation strategy for the ad, applies it and eventually gives me back a "well thought out" video. And all I had to do was provide a little bit of an intro and then let the thing do its magic for an hour. I know we have lang chain and baby AGI but they are not as robust as they would need to be to displace a bunch of jobs just yet (but I assume they will soon enough).