So the trick for performant animation here is to draw on canvas and put the canvas in front of all other elements but disable pointer events on canvas so that you can still interact with the page.
Canvas draws raster images, anything resembling an object in your drawing logic is already tracked separately by necessity. So regardless, you’d presumably check against whatever data model you’re using to determine what to draw.
You call it a trick for performant animation, but I couldn't think of any other way to implement something like this. What would a naive implementation look like?