People use animatediff’s motion module (or other models that have cross frame at...

dragonwriter · 2023-11-21T19:52:46.000000Z

Temporal consistency is improving, but “close to being solved” is very optimistic.

mattnewton · 2023-11-21T20:13:52.000000Z

No I think we’re actually close. My source is I’m working on this problem and the incredible progress of our tiny 3 person team at drip.art (http://api.drip.art) - we can generate a lot of frames that are consistent, and with interpolation between them, smoothly restyle even long videos. Cross-frame attention works for most cases, it just needs to be scaled up.

And that’s just for diffusion focused approaches like ours. There are probably other techniques from the token flow or nerf family of approaches close to breakout levels of quality, tons of talented researchers working on that too.

ryukoposting · 2023-11-22T06:18:53.000000Z

The demo clips on the site are cool, but when you call it a "solved problem," I'd expect to see panning, rotating, and zooming within a cohesive scene with multiple subjects.

mattnewton · 2023-11-22T17:52:44.000000Z

Thanks for checking it out! We’re certainly not done yet, but much of what you ask is possible or will be soon on the modeling side and we need tools to expose that to a sane workflow in traditional video editors.

Hard_Space · 2023-11-22T07:23:58.000000Z

Once a video can show a person twisting round, and their belt buckle is the same at the end as it was at the start of the turn, it's solved. VFX pipelines need consistency. TC is a long, long way from being solved, except by hitching it to 3DMMs and SMPL models (and even then, the results are not fabulous yet).

valine · 2023-11-21T19:50:59.000000Z

Hopefully this new model will be a step beyond what you can do with animatediff