In the full blog post, they show that the numbers went from:
* Chat bubble: 34 conversations from 8,004 visitors (0.42%)
* Nav link: 45 conversations from 6,622 visitors (0.68%)
It probably performs at least on par with the chat bubble, but it doesn't seem like enough data to say confidently that the navbar outperforms the chat bubble.
I agree that it's a net win to remove the intrusive chat bubble if they're not sacrificing conversations, but the title is overstating the evidence.
How much data would you need for confidence? According to my calculations this is 98% confidence. That feels like 'enough' to make a decision for my small startup.
I'm pretty explicit in the blog post that this isn't meant to be universally applicable— it's just what happened to us.
First of all, kudos for quantifying your results instead of hand waving them. Yes, your results look like a ~60% improvement in conversion rate from the A to the B test, with a p value of 0.02 and a statistical power of around 80% for a two-tailed test. So that's good.
However context is important - at this level of significance you'd expect to see a similarly strong, but ultimately spurious, effect going from the A to the B test about 1 in 50 times.
Since you're not working on something safety critical, that's probably an acceptable false positive rate for you. But generally speaking, and in particular here since the absolute numbers and changes are quite small, I would be wary of trusting such a result. It seems promising but inconclusive. Maybe run a few more tests with disjoint (or nearly so) samples of visitors?
There are a few other things that could possibly confound the result - off the top of my head, your screenshots look like different pages between the A and B test. I'm not sure if that's how you ran the experiment or if you just happened to use two different page screenshots, but that would typically disqualify the result and require another test.
1) Why did the two groups have such different N sizes? If it was intended to be run as a 50-50, a large delta would make me wonder if there was an exposure bias
2) For the baseline rate (0.4%), this test is underpowered for even a 50% change, meaning you will have a high false discovery rate
I'm somewhat of a layman, but I'd wager A/B pages printed 50:50 (by IP for instance) could lead to a rather solid conclusion if ran long enough. On the other hand, eh, chat bubbles suck and you can quite confidently say they don't help, so might as well keep it this way. On a personal note I do feel like I would be much more prone to click a chat request as another menu option than a bubble.
I think people are probably more concerned that it doesn't have economic significance. It might have statistical significance, but the effect is still very small, so does it really matter? That's a common trap people forget to consider. You see this a lot in finance research where some variable is statistically significant in a model, but the difference in the economic outcome is so small that it doesn't matter.
In this case, these are indeed tiny percentages. But you're going from something that a lot of people dislike and that is more complicated (from a technical standpoint), so we can simplify things and user interaction with live chat is not impacted.
You'd hope that the percentages here would be small: people need to be facing a problem with your product before they are part of the sample population.
There is a lot of potential economic impact here; during a major problem (which WILL happen) the number of people looking for support would increase sharply. If people are blind to your support system, you could be saying goodbye to a large amount of customers. Ignoring stuff like this is a far more common trap. We spend a lot of money on firefighters, even though we hope they never have any fires to fight.
* Chat bubble: 34 conversations from 8,004 visitors (0.42%)
* Nav link: 45 conversations from 6,622 visitors (0.68%)
It probably performs at least on par with the chat bubble, but it doesn't seem like enough data to say confidently that the navbar outperforms the chat bubble.
I agree that it's a net win to remove the intrusive chat bubble if they're not sacrificing conversations, but the title is overstating the evidence.