It's actually really simple. We'll write a and b in binary notation, for example:
a = 1001101
b = 0100111
Now what happens when you add two numbers in binary? We essentially add the numbers in each column together, and if it overflows, we carry to the next column (this is how you carry out addition in general).
So what are the columns where we need to carry, the ones that overflow? These are given by (a&b) - the columns where both a and b contain a one. To actually carry we just move everything one position to the left: ((a&b)<<1). And what are the columns where we don't need to carry, the ones that don't overflow? These are the ones where we have exactly one zero, either in a or in b, so: a^b.
In other words, a + b = ((a&b)<<1) + (a^b). To compute the average, we divide by two, or in other words, we bitshift to the right by one place: (a + b)/2 = (a&b) + ((a^b)>>1)
There's this well-known identity [1] a + b = (a ^ b) + ((a & b) << 1). This is essentially how a parallel ripple carry adder works. So (a + b) / 2 = (a + b) >> 1 = ((a ^ b) >> 1) + (a & b).
It's pretty obvious visually, you take the exclusive disjunct (a^b -> yellow) divide by two (>>1 -> bright yellow) and then add the intersection (a&b -> green) you use the full intersection because in the basic method (a+b / 2) they would each add their own half of the intersection.