Bayes' Theorem is a lot easier to grok if you make a slight rearrangement. Instead of this:
p(A|B) = P(B|A) P(A) / P(B)
move that P(B) to the other side:
p(A|B) p(B) = p(B|A) p(A)
Now notice that p(A|B) p(B) is simply the probability that both A and B occur. That is, it is p(A∩B). The same goes for the right side.
Basically, to compute p(A∩B), you can say "I've got to have A", which gives a p(A) factor, and "given that A, I then need to also have B", which gives a p(B|A) factor, so p(A∩B) = p(A) p(B|A). Or you can start with B, and then say you need A given B, and that gives you p(A∩B) = p(B) p(A|B).
So all Bayes really is essentially is a statement that there are two ways to compute p(A∩B) and they have to give the same result.
PS: if you are thinking "wait a second...I thought p(A∩B) = p(A) p(B)", that's for independent events. If A and B are independent, P(A|B) = p(A) and p(B|A) = p(B), and the general formula then reduces to p(A∩B) = p(A) p(B).
Basically, to compute p(A∩B), you can say "I've got to have A", which gives a p(A) factor, and "given that A, I then need to also have B", which gives a p(B|A) factor, so p(A∩B) = p(A) p(B|A). Or you can start with B, and then say you need A given B, and that gives you p(A∩B) = p(B) p(A|B).
So all Bayes really is essentially is a statement that there are two ways to compute p(A∩B) and they have to give the same result.
PS: if you are thinking "wait a second...I thought p(A∩B) = p(A) p(B)", that's for independent events. If A and B are independent, P(A|B) = p(A) and p(B|A) = p(B), and the general formula then reduces to p(A∩B) = p(A) p(B).