Small temporary answer while I unwind all the linked content, Dixon's target aware pointing is already missing so much. I wonder how on earth nobody in smartphone land thought to implement something similar. I'm already hooked :)
It's missing from many contexts where it would be very useful, including mobile. It's related in many ways to those mobile gui, web browser and desktop app testing harnesses. It could be implemented as a smart scriptable "double buffered" VNC server (for maximum efficiency and native Accessibility API access) or client (for maximum flexibility but less efficient).
The way jQuery widgets can encapsulate native and browser specific widgets with a platform agnostic api, you could develop high level aQuery widgets like "video player" that knew how to control and adapt many different video player apps across different platforms (youtube or vimeo in browser, vlc on windows or mac desktop, quicktime on mac, windows media player on windows, etc). Then you can build much higher level apps out of widgets like that.
Target aware pointing is one of many great techniques he shows can be layered on top of existing interfaces, without modifying them.
I'd like to integrating all those capabilities plus the native Accessibility API of each platform into a JavaScript engine, and write jQuery-like selectors for recognizing patterns of pixels and widgets, creating aQuery widgets that tracked input, drew overlays, implemented text to speech and voice control interfaces, etc.
His research statement sums up where it's leading: Imagine wikipedia for sharing gui mods!
Berkeley Systems (the flying toaster screen saver company) made one of the first screen readers for the Mac in 1989 and Windows in 1994. https://en.wikipedia.org/wiki/OutSpoken
Richard Potter, Ben Shneiderman and Ben Benderson wrote a paper called Pixel Data Access for End-User Programming and graphical Macros, that references a lot of earlier work. https://www.cs.umd.edu/~ben/papers/Potter1999Pixel.pdf