This is one of the most overlooked problems in generative AI. It seems so trivial, but in fact, it is quite difficult. The difficulty arises because of the non-linearity that is expected in any natural motion.
In fact, the author has highlighted all the possible difficulties of this problem in a much better manner.
I started with some simple implementation by trying to move segments around the image using some segmentation mask + ROI. That strategy didn't work out, probably because of some mathematical bug or data insufficiency data. I suspect the later.
The whole idea was to draw a segmentation mask on the target image, then draw lines that represent motion and give options to insert keyframes for the lines.
Imagine you are drawing a curve from A to A. You divide the curve into A, A_1, A_2... B.
Now, given the input of segmentation mask, motion curve, and whole image, we train some model to only move the ROI according to the motion curve and keyframe.
The problem with this approach is in sampling the keyframe and matching consistencies --making sure RoI represents same object-- across subsequent keyframes.
If we are able to solve some form of consistency, this method might be able to give enough constraints to generate viable results.
I currently shelved 3K more words of why it's hard if you're targeting real animators. One point is that human inbetweeners get "spacing charts" showing how much each part should move, even though they understand motion very well, because the key animator wants to control the acting
This is one of the most overlooked problems in generative AI. It seems so trivial, but in fact, it is quite difficult. The difficulty arises because of the non-linearity that is expected in any natural motion.
In fact, the author has highlighted all the possible difficulties of this problem in a much better manner.
I started with some simple implementation by trying to move segments around the image using some segmentation mask + ROI. That strategy didn't work out, probably because of some mathematical bug or data insufficiency data. I suspect the later.
The whole idea was to draw a segmentation mask on the target image, then draw lines that represent motion and give options to insert keyframes for the lines.
Imagine you are drawing a curve from A to A. You divide the curve into A, A_1, A_2... B.
Now, given the input of segmentation mask, motion curve, and whole image, we train some model to only move the ROI according to the motion curve and keyframe.
The problem with this approach is in sampling the keyframe and matching consistencies --making sure RoI represents same object-- across subsequent keyframes.
If we are able to solve some form of consistency, this method might be able to give enough constraints to generate viable results.