Having the logic to interpret and respond to input on the device reduces the latency between interaction and effect. Also having at least some power on the device means it's useful by itself. It also reduced batter drain on both devices to only send small BT packages instead of essentially streaming video constantly during use.
Who thought it was a good idea to put a whole computer in glasses or smartwatches?