How far away are we from "type in a prompt, get an animation" generative style U...

thegeomaster · on Aug 15, 2023

Interesting question!

From some anecdotal experience, large language models struggle with spatial structure (which makes sense given their modality and training data). On the other hand, diffusion models create great images, but this does not translate very well to vector data.

Animation is not a very well researched modality for AI, so it could go either way. It's definitely an interesting direction to consider, as it can democratize motion design even further.

danielvaughn · on Aug 16, 2023

True, but in my (limited) understanding, a lottie animation is just json right? I've gotten some really impressive json results with GPT.

duskwuff · on Aug 16, 2023

It's a JSON structure full of numbers which describe colors and spatial relationships. LLMs have no problem with the JSON syntax, but the numbers they put in it will be nonsense.

synapticpaint · on Aug 16, 2023

Pretty close. Not the same kind of animation, but here are some of my experiments with text to video for narrative content:

https://www.youtube.com/watch?v=CgKNTAjQpkk

https://youtu.be/X0AhqMhEe-c

You'd have to convert to vector, or tweak your model architecture to work with vector format.

gochi · on Aug 15, 2023

Animations require consistency to work and generations are still very bad at consistency. You can see this in action through any AI generated video, that "flickering" are tiny inconsistencies between frames that throw the entire thing off.

Once that issue is fixed, then it's a green light as everything else is vaguely ready.

BHSPitMonkey · on Aug 16, 2023

I think you're talking about generating bitmaps of video, one frame at a time - which is a pretty different task from generating vector animation. If the LLM is approaching the task like an animator would (i.e. I want this shape to move here slowly for a long time, then grow rapidly for a short duration, then...) and expressing the result in some kind of keyframe animation format (Lottie, AE, Unity, etc.) then you aren't having to deal with the kinds of artifacts you described.