"Naively, this sounds like one might (a) compute the translation from one pixel to the next in each frame, and (b) re-render the video with small motions amplified. Unfortunately, such an approach would lead to artificial transitions between amplified and unamplified pixels within a single structure. Most of the steps of motion magnification relate to reliably estimating motions, and to clustering pixels whose motions should be magnified as a group."
The math behind the grouping (recognizing the relevant area as an object, magnifying the motion of the area properly, and filling in the empty space intelligently) is the cool "computer vision" part.
"Naively, this sounds like one might (a) compute the translation from one pixel to the next in each frame, and (b) re-render the video with small motions amplified. Unfortunately, such an approach would lead to artificial transitions between amplified and unamplified pixels within a single structure. Most of the steps of motion magnification relate to reliably estimating motions, and to clustering pixels whose motions should be magnified as a group."
The math behind the grouping (recognizing the relevant area as an object, magnifying the motion of the area properly, and filling in the empty space intelligently) is the cool "computer vision" part.