I agree that based on combination (n! / k!(n-k)!) it seems to be 1 in 20, but when you think of it as running A and B and checking if X or Y is faster 3 times, then you get 1 in 8. Quite a big difference. Where does it come from? Am I doing something wrong? I mean combination approach counts same time results as a win but given enough time precision we could skip this scenario.
edit: Ah, got it, in my case 3rd result for X can be higher than the first result for Y. You said all times must be smaller.
Let's say there are 6 trials -- name them x1, x2, x3, y1, y2, y3.
In the case of checking if x is faster than y 3 times, you're doing x1 < y1 AND x2 < y2 AND x3 < y3.
In the case of checking if all three measurements of x are smaller than all three measurements of y, you're checking x1 < y1 AND x1 < y2 AND x1 < y3 AND x2 < y1 AND x2 < y2 AND x2 < y3 AND x3 < y1 AND x3 < y2 AND x3 < y3.
In other words, the latter case is checking whether the slowest x of three trials is faster than the fastest y of three trials.
edit: Ah, got it, in my case 3rd result for X can be higher than the first result for Y. You said all times must be smaller.