"k is the expected number of future users who will be exposed to a result" Does ...

aaronjg · on May 4, 2012

Anscombe talks about this and proposes two solutions:

One is to estimate it based on the number of daily visitors your site gets, and then estimate how long you will run the winning alternative in the campaign.

He also proposes: 'perhaps k should be assesed, not as a constante, but as an increasing function of |y|/n, since the more striking the treatment difference indicated the more likely it is that the experiment will be noticed... One way of introducing such a dependence of k on |y|/n is to assess k+2n as a constant.'

This actually simplifies the math somewhat, and you can see the full details in Anscombe's paper cited in the blog.

dfabulich · on May 5, 2012

I don't get it. What if this is my home page? What if I intend to run the campaign "forever?"

If k is an estimate of how much traffic I will ever see, it seems like I'm going to be calculating the Phi-inverse of approximately 0.

Where can I see Anscombe's paper online? (It was published in 1963; it's not linked in the blog post, just cited.)

aaronjg · on May 5, 2012

Just sent you a copy of the paper. If you plan to use the result 'forever' then theoretically you would be willing to sacrifice a huge (infinite) amount of suboptimal performance now, so that you get the correct answer in the for when you decide to pick the winning idea. It would be very important to have the correct winning idea, because it is going to run for eternity.

In practice, we don't actually ever run the winning idea for ever. We do website re-designs periodically, we test new ideas, business needs change. So we can pick a reasonable value for k based on these constraints.

Alternatively, you can get better performance by _not_ picking a stopping criteria, and dynamically choosing which homepage to show. As soon as one idea appears to be doing better, you start showing that to more users. By choosing the appropriate adaptive sampling strategy, you can reduce regret to be less than if you have a constant sampling strategy. However, for many people the adaptive strategy may be more trouble to implement than it is worth.

The most important takeaway is to _not_ use repeated significance tests to determine experiment termination time. Either use the Anscombe bound with an appropriate k, or fix the sample size before starting the experiment.