I'm Split Testing ... Why Haven't I Doubled My Revenue Yet?

btilly · on Aug 26, 2010

Here is an alternate theory.

Stick the numbers post-conversion in to http://elem.com/~btilly/effective-ab-testing/g-test-calculat... (41 successes out of 638 trials versus 35 successes from 416 trials) and the conclusion of unequal performance has 72.42% confidence. Meaning that more than 1 time in 4 you'd have a difference that big or bigger by chance.

In other words the entire basis of this post could be a chance statistical fluctuation that should be ignored.

It is true that there can be effects where pushing less qualified leads through the top stage of the funnel doesn't get them to the end. However my experience with A/B testing is that it is more common for the extra people put in the system by an A/B test at the top to convert the rest of the way relatively similarly.

But not always! Which is why if you have sufficient volume you should always measure to actual sales. There is no other way to be absolutely sure that you are improving end sales.

However in this example that would mean running the test for something like 20x as long. In that case it makes sense to be pragmatic, test from one step of the funnel to the next, and then pivot on the answers you get. Furthermore to start you should focus on the top of the funnel for the simple reason that higher volumes will get you answers faster there - you can easily try a dozen ideas before you could test one idea deeper in the funnel.

Once you've improved your site enough to get a better percentage of actual sales, you'll be able to purchase more traffic. Doing both of those things will put you in a position to conduct more rigorous A/B tests to eke out more subtle differences. But that is down the road. Focus on testing what is easiest in the quickest possible way first.

JangoSteve · on Aug 26, 2010

In other words the entire basis of this post could be a chance statistical fluctuation that should be ignored.

I agree that the particular stats referenced in the article may not be statistically valid, but I wouldn't argue that those stats were a supporting detail rather than the entire basis. The main point as I understood it was to illustrate, more or less, why a 100% increase in conversions to the purchasing page does not equal a 100% increase in conversions from the purchasing page to actual purchase.

They are saying that once you start attracting traffic beyond the early adopters, your additional traffic is now comprised of a different group of people who exhibit fundamentally different behavior in how likely they are to make a purchase even once they've hit the purchasing page.

btilly · on Aug 26, 2010

The main point as I understood it was to illustrate, more or less, why a 100% increase in conversions to the purchasing page does not equal a 100% increase in conversions from the purchasing page to actual purchase.

Yes, it offered a theory about why this was so. Yet my experience is that a 100% increase in initial conversions typically results in approximately a 100% increase in sales.

My further experience from doing A/B testing for many years is that lots of people are eager to grab any numbers you give them, then run with them and form grand theories that aren't backed up by the actual statistics. Those theories have a remarkably low success rate in explaining the results of the next A/B test you run. (Or, frequently, the current test once we let it run longer.)

They are saying that once you start attracting traffic beyond the early adopters, your additional traffic is now comprised of a different group of people who exhibit fundamentally different behavior in how likely they are to make a purchase even once they've hit the purchasing page.

It was an attractively presented theory. True, I've learned to be cautious of attractively presented theories which aren't actually backed by data. But it was definitely attractively presented.

However no evidence was offered that the people progressing in group A were actually significantly different than the people progressing in group B. And if the test truly was letting crappier traffic through, and the crappiness of that traffic was the primary cause of trends in subsequent behavior, then that traffic has to be REALLY crappy to explain the difference. Occam's razor says that random chance was the cause of the data, or at least a large enough contributor that there is no immediate need to think too hard about other possibilities. And so until better data becomes available, I'm going to suggest that "chance fluctuation" deserves a hearing.

Medicine sometimes calls these zebras. Why? Because it is like someone hearing something with 4 hooves run past an open window and immediately figuring out that it is a zebra. Sure, it _COULD_ be a zebra. _SOMETIMES_ it proves to be a zebra. But the odds are much better that it was a horse.

Guess the garden variety answer before guessing the exotic one.

markitechtMA · on Aug 26, 2010

Thanks for the feedback.

So, regarding this being a fluke one-off or the assertion that there isn't data to back this up, I have reams of tests that display similar behavior, many with tens of thousands of unique participants. As the other commenter said, the example in the post was icing, not the cake, but I take your point and it's a good message that people are reading my posts who know something about this and maybe I can get a little deeper into advanced topics and results and stuff, so that's good to hear.

If your early-funnel optimizations directly predict the end-funnel conversion, I think that 1) that's awesome 2) the products you are testing have a huge amount of room to grow (also awesome, hope the hiring goes well it's always tricky finding great people) and 3) the phenomenon i was describing doesn't really apply to you and those products yet.

One thing that may not have been crystal clear is that it's one group of people that we are talking about, and examining the segments of that group. Let's put it another way. There are 1,000 people who see a certain offer page on a given day. 1 of those people is a fellow who has read 3 reviews, used competing products, and decided 100% that he is going to buy the product. The page could have a tiny little 10px link to add the product to cart, it could be in the footer and #EEE and this dude would still find it and complete the purchase.

Now, as you make it clearer to people why and how they should buy the product, you are not generating more interest like the first fellow's. You are simply inviting a broader selection of the audience to consider the purchase a bit more. That's really the principle that is at play here.

Thanks again for reading, and for the comment.

JangoSteve · on Aug 25, 2010

Ah, it's a rhetorical question. I was ready to read a rant; well played, sir. If you're short on time, check out the last graph, it's gold.

po · on Aug 26, 2010

Another tactic besides trying to move out the curve is: screw the skeptical aholes and create another product that the early-adopters will go for. If you have people tearing holes in their pants trying to get their wallet out faster, capitalize on it. Focus on that very small wedge on the left. Let's call this "the Apple strategy".

zemaj · on Aug 25, 2010

If this holds for more situations, I guess the conclusion I would draw is that spilt testing should be done as far down the funnel as possible to generate the most return.

stavros · on Aug 26, 2010

If you're losing 90% of the people on the first step and 50% of that 10% on the second step, is the second step really what you need to be optimising?

markitechtMA · on Aug 26, 2010

Possibly, yes. The 'start at the cart' theory of optimizations says that you should absolutely start with optimizing that 50% and working 'backwards.' The idea being that it is a lot easier to close sales that are in process than to generate more sales leads that may or may not be qualified enough to actually initiate a purchase.

Focus on reeling in the fish on the hook, to put it grossly, as opposed to putting more hooks in the water.

Pretty easy to see the other side of this though, and like the correct answer to any poker question, the answer ends up being "it depends". =)