Great article! This is one of the most overlooked problems in generative AI. It ...

Great article!

This is one of the most overlooked problems in generative AI. It seems so trivial, but in fact, it is quite difficult. The difficulty arises because of the non-linearity that is expected in any natural motion.

In fact, the author has highlighted all the possible difficulties of this problem in a much better manner.

I started with some simple implementation by trying to move segments around the image using some segmentation mask + ROI. That strategy didn't work out, probably because of some mathematical bug or data insufficiency data. I suspect the later.

The whole idea was to draw a segmentation mask on the target image, then draw lines that represent motion and give options to insert keyframes for the lines.

Imagine you are drawing a curve from A to A. You divide the curve into A, A_1, A_2... B.

Now, given the input of segmentation mask, motion curve, and whole image, we train some model to only move the ROI according to the motion curve and keyframe.

The problem with this approach is in sampling the keyframe and matching consistencies --making sure RoI represents same object-- across subsequent keyframes.

If we are able to solve some form of consistency, this method might be able to give enough constraints to generate viable results.