How would one go about correlating the slightly quicker load time to a 5% improv...

whiskers · on Feb 8, 2012

Measure your conversion rate for a statistically significant amount of time.

Deploy new version of site that is 0.5 seconds faster to load.

Measure your conversion rate for a statistically significant amount of time.

Compare conversion rates.

patio11 · on Feb 8, 2012

This will not, in fact, produce reliable results. Your conversion rate will tend to change over time anyhow regardless of the "treatment" option in the second "statistically significant amount of time", because your conversion rate is sensitive to things like e.g. traffic mix, PR, and whatnot which are not uncorrelated with when you take the measurements.

This is why we don't do medical trials by giving people aspirin, measuring symptoms, then giving the same people a sugar pill, then measuring symptoms again. Instead, we give different people the two treatments at the same time, such that one population functions as a control group for the other. This is the essence of A/B testing, too.

The right way to measure this if you want reliable results is to put a load balancer or something in front of the page at issue and split half the people into the old architecture and half the people into the new architecture, then measure their conversion rates simultaneously. 37Signals knows this, and they allude to it in their blog post. That's OK though. You don't need to apologize for not gathering good data on whether making your site faster is better. Testing costs money, and testing known-to-be-virtually-universally-superior things is rarely a good allocation of resources.

You just probably shouldn't attribute your increase in conversion rates to the change you made without testing.

azxcnjdk · on Feb 8, 2012

Thanks. Wouldn't this need to be done simultaneously? For example, serving half your users methodA and the other half with methodB.

ptmx · on Feb 8, 2012

That would certainly be better experimental design, since you would be controlling for other factors. On the other hand, precisely measuring the improvement in conversion isn't particularly important in this case; it's already clear that faster is better, so you're not gaining much actionable information from the measurement, whereas you would be giving half of your users a worse experience. In a situation where you were uncertain about which of two methods is better, it would definitely be better to run them in parallel like you've suggested so that you had a fair comparison.

apl · on Feb 8, 2012

You're right: 37signals don't have to do this test properly. Their prerogative. However, until they do, the 5% figure and implied causation are meaningless.

We don't know if their SD is in the same ballpark. Thousands of possible confounds. Plus, there's no solid a priori reason why shaving off latency should improve their conversion rates drastically -- Basecamp doesn't rest on a large number of small, potentially impulsive transactions like Amazon does. Without more data (or at least an explanation), this doesn't tell us anything.

ptmx · on Feb 8, 2012

I agree, and I meant to emphasize more clearly that I don't think the 5% figure is meaningful (and that it shouldn't have been stated without the appropriate caveats in the article).

My main point was simply that I don't think it's prudent to create a worse experience for half of your users when you're so unlikely to gain any actionable information from it. It would be quite extraordinary if they found out that the speed increase caused a decline in conversion or a huge improvement in conversion, so I think it's safe to say that the measurement wouldn't make them reverse the change or make them allocate significantly more resources toward faster load times.

But, to reiterate, I agree that they should not have made the claim about 5% conversion when it wasn't properly supported.

jonsmock · on Feb 8, 2012

Yeah, serving up two versions simultaneously and split testing them would be more scientific, but I appreciate the number anyway. It was an after the fact observation rather than an original goal, so I wouldn't expect him to go back and deploy the slower version just to test the number.

The hole here is whether or not they unknowingly got a new influx of traffic that was 5% more likely to convert, skewing his final observation, which I would say is unlikely. Your point is good in general, however.

apl · on Feb 8, 2012

Nitpick: You'd have to test the difference between before and after at a certain level of significance, not just establish estimates for before and after. Different things.