We’ve already posted a few articles that show how important experimentation is for good marketing and how to ensure your A/B testing gets results.
But there had always been something missing: In every post, we mentioned statistical significance without explaining what it actually is and why it matters so much in marketing.
Today’s the day we get to grips with statistical significance!
Calculating Statistical Significance
“Statistical significance” might sound intimidating to anyone who doesn’t consider themselves a “numbers” person. But don’t worry — we’ve built a handy statistical significance calculator to do the hard math for us. Use this tool to calculate the statistical significance of your experiment results. Simply plug in impressions or sessions compared to clicks or conversions for your control and variable. The calculator will tell you if your A/B test is conclusive or could be the result of chance.
What Statistical Significance Is And Why It Is Important
Marketers use statistical significance to determine whether or not making an isolated change to marketing collateral, like a landing page, will improve its success with their target audience. A result that is statistically significant is unlikely to have occurred simply by chance, as the confidence percentage must be extremely high. An ideal result for statistical significance will either have 95% confidence (or higher), or less than 5% p-value.
P-value is the probability of getting your results, providing there is no relationship between the control and the variable. We call this relationship the null hypothesis. There’s a little more information on p-value here if you want to swot up a little more on the term.
Here is a simple breakdown of all the nuts and bolts of statistical experiments to help explain the more technical terms we’ve been using:
- Every statistical experiment will have a control, variable, null hypothesis and alternate hypothesis. An alternate hypothesis is made before the experiment takes place, and is your best forecast of your outcome based on the evidence available.
- The null hypothesis is the possibility that there is no effect or difference between the control and the variable.
- In most A/B experiments, marketers are trying to disprove the null hypothesis. By doing so, they are proving that a change in behavior does occur and that this change is evident from the data.
P-value gives you the likelihood of your null hypothesis. - A small p-value (less than or equal to 0.05) indicates strong evidence against the null hypothesis.
- A large p-value (greater than 0.05) indicates weak evidence against the null hypothesis.
- Let’s say our alternate hypothesis is that the variable will outperform the control. If the varaible does outperform the control with 95% confidence (or 0.05 p-value) or greater, then we have strong evidence to reject the null hypothesis (the claim that there is no relationship in the data) and accept the alternate hypothesis.
Marketers seek 95% confidence or higher because statistical significance reduces the size of the p-value and thus negates the null hypothesis. Conclusions can be drawn on 90% confidence but as a general rule of thumb we would say the higher the better!
It might sound easy, but there’s much more to it. If you’re a marketer, I’m sure you’ve heard from colleagues complaining that their findings for statistical significance didn’t bring about any fruitful changes. So how can you use your data effectively?
Validating Your Insights
Marketers use statistical significance to validate their implementation of the insights gained from A/B testing. However, Peep Laja from ConversionXL cautions marketers against immediately jumping to implement their results.
“Statistical significance and validity are not the same,” says Laja.
So how can you be assured of getting reliable results?
Take your time: One of the deadliest sins in A/B testing is stopping the test too soon, even when the numbers start to appear statistically significant.
Running a test for three or four days might result in a statistically significant result, but if you were to run that same test for three to four weeks you might unearth very different outcomes. It’s no wonder some marketers are disappointed when their implementations based on a mere four-day test did not end in quite as many conversions as they had originally expected.
Plan your experiments to run a full business cycle (that’s usually a month). If your numbers reach 95% confidence before the month is over, don’t be tempted to stop the test! According to Tom Wesseling, “The more data, the higher the statistical power of your test!”
How To Avoid Inconclusive Results
Not all A/B tests will produce statistically significant results. In fact, inconclusive tests appear to be the norm rather than the exception. Statistically insignificant results could mean one of two things:
1) You need to keep your test running for longer.
We recommend running a three- to four-week test to get a rich sample size across all times of day and days of the week.
2) Your results are inconclusive.
Here is some great advice on how to interpret the results of an inconclusive experiment.
Get started using our Statistical Significance Calculator! You will need impression or session data and the number of clicks or conversions for your control and variable to use the calculator.
When the calculator loads, you’ll be prompted to subscribe for free to IterativeMarketing.net, where you will gain complete access to our premium content.
Elizabeth Earin says
What a helpful tool! And the explanation is easy to understand. I have a few marketing friends that I will definitely be sharing this with. Thanks for the great content!