What to Do Before Looking at Your Results
Imagine a trip to the grocery store. You go on an empty stomach with no grocery list and two hungry kids in tow.
Will this be an efficient trip that ends with a healthy meal that’s going to make everyone happy?
Unlikely. Instead, most of us would wander the aisles aimlessly, eventually picking up the first thing that catches our eyes.
Looking at your A/B test results is a lot like a trip to the grocery store. Don’t go in distracted and unready.
Go in prepared with a list of what you’re looking for (your top metrics), an idea of what to expect (do the numbers I’m seeing seem reasonable?), and an open mind to discovering new things (I didn’t expect users to do that!) and you’re going to be successful.
Prep by asking the following questions:
- What is the primary goal for this test? What metric am I most hoping to see an impact to? Conversions? Revenue per visitor? Click-throughs? Something else?
- What secondary goals may be impacted by this test? Secondary goals are less important than the primary goal, but they may actually be more likely to be impacted by your test. For example, an “Add to Cart” button test may mostly affect the secondary metric of “Add to Cart” clicks. But the primary goal for that test may be to impact purchase conversions – a goal further down the funnel. Considering your secondary goals before looking at your results prepares you to think through the test from your users’ perspectives.
- Is this test likely to be done or will it take more time? Asking this question prior to reviewing the results will ensure you don’t overreact if you see an overwhelmingly positive, negative or flat result.
- How many total visitors do I expect to see? A drastically higher or lower than expected number of visitors may mean the test wasn’t properly set up. For example, did you mean to target all product pages or just that one page with the blue hooded sweatshirt? Look at your analytics platform to roughly estimate the number of visitors your tested page(s) should get. If you set your test up to only allow a percentage of eligible users to enter the test, factor this in.
- Is this test likely to impact different audiences segments differently? As an example, returning users may not react as well to changes because they’re not used to them. New users, by contrast, have nothing to compare to and may be converting better in your new variation. Look at different audience segments and consider how they may be impacted by the actions you take when the test concludes.
- Since the launch, is there anything that happened that may have impacted the test? Did you start a major search marketing campaign, send out direct mail or air a Super Bowl commercial (I wish!)? Will these efforts drive more traffic to the site or affect one variation in a unique way? If so, they your results need to be considered in this context.
Take a moment to ask these questions before looking at your results. You’ll more quickly make sense of the data, spot errors in your test setup, discover surprising insights, and not overreact if something seems askew.
A/B Test Results Components Defined
Understanding how key results page components are defined is also critical to interpreting your results correctly.
Every testing platform is different but there are commonalities across all of them. Here’s what you need to know.
This is simply the percentage of visitors (or sometimes visits) that “convert” on any particular action.
Every goal (with the exceptions of any goal with “average” in the name – e.g. “average order value” – and revenue per visitor) has a conversion rate. If there are 100 visitors who could have converted and only 50 of them did, your conversion rate is 50 percent.
Adobe Marketing Cloud has more information here.
Lift (a.k.a. “Improvement”)
This is what most testers tend to care about most. It’s simply the relative percentage change one variation shows against another for any particular metric.
Here’s how it’s calculated:
Example: Your purchase conversion rate is 5.2 percent for your original experience and 5.8 percent for your variation. Your lift would be: (5.8 percent – 5.2 percent) / 5.2 percent = 11.53 percent.
Two Variations Performing Better than Original?
Let’s say both B and C are statistically significantly better than A. While C is the best performing, is it statistically significantly better than B?
Change your baseline conversion rate” to B and see if your testing platform says it’s meaningfully better. If not, B and C are essentially equal. The section “Choosing Your Baseline” in this article describes this more.
Visitors vs. Visits
Most testing platforms give you the ability to toggle between visitors and visits when looking at results.
Precisely how they’re defined may differ between platforms, but generally speaking a visitor is an individual user and a visit is a session.
Imagine our grocery store again… If 1,000 people go to the store that day but 20 of them return later in the day for a late night snack, the store will have 1,000 visitors but 1,020 visits. The same concept applies to your website.
Your results are on fire! You have 100 percent lift!! Amazing, right?
Not so fast…
Did you look at your total conversions? If Experience A has one conversion, and Experience B has two conversions, you may have 100 percent lift, but you do not have statistical significance. This “lift” is likely due to chance.
Your platform’s statistical significance indicator will help, but so will a good dose of common sense.
Much like total conversions, consider how many days your test has been running.
Even if your testing platform says you have you reach statistical significance, would you really end a test after only one day? What if that day was a fluke?
Most ecommerce sites run their tests for at least one full week to get both weekday and weekend traffic and to negate external factors that may have influenced the test.
Sometimes called “Confidence”, statistical significance can be a touchy subject (after all, it was Mark Twain who popularized the phrase “There are three kinds of lies: lies, damned lies, and statistics.”).
I’ll leave the math to the experts (these folks and this guy are good places to start), but suffice it to say that this is your testing tool’s way of saying the results you’re looking at are – most likely – not due to random chance but due to the changes you made in your variations.
Simply put: making a decision on test results that aren’t statistically significant isn’tt much better than making a random decision.
Think of this as your margin of error. Presidential polls have these and so do your test results. The difference interval represents the upper and lower range of your “real” conversion rate.
For example, a conversion rate of 6 percent and a difference interval ± 1.5 percent means your actual conversion rate could be as low as 4.5 percent and as high as 7.5 percent.
Generally speaking, the larger your sample size and the more conversions you have, the lower your difference intervals.
When conversion rate ranges between two variations overlap, your testing platform will indicate you don’t have statistical significance. Why? Because your results are essentially within the margin of error.
In the election of 1948, The Chicago Tribune seems to have forgotten this when they went to press… just ask former president Truman.
Top Suggestions and Guide for Reviewing Ecommerce Results
Now that you’ve prepared for your results review and you understand the definitions of key results page components, let’s outline the process an ecommerce company should take to review results.
Focus on Purchase Conversion Rate
Unless you’re running a test with the specific aim of getting users to add additional products to their carts (e.g. like magazines and candy bars in the checkout lane at the grocery store) or to purchase a higher-priced item over a lower priced item, you’re generally trying to get more users to make a purchase.
Since most ecommerce sites have conversion rates between 2-8 percent, by focusing on purchase conversions you’re focusing on the 92-98 percent of users who are coming to your site and not buying.
This is a good place to start.
Review Secondary Goals
Secondary goals tell a story about your users. While secondary goals may not be enough by themselves to push a winner, they may help you come up with your next great test idea.
For example, a product page may test the related products it shows lower on the page. Experience A may show higher priced items and Experience B may show lower priced items.
Experience B may show a higher conversion rate (your primary goal) but your total revenue (a secondary goal) may drop as users are purchasing lower priced items.
This tells you your users are cost conscious. The more secondary goals in your experiment, the richer the story you can discover about your users.
Review Key Audience Segments
The most common audience segments for ecommerce sites are new visitors and returning visitors (a subset of which is returning purchasers).
New visitors may never have been to your site before and may not be as familiar with your brand. Returning visitors may be familiar with your brand and may have a product already in mind.
As you can imagine, these users may browse differently. If you have other audience segments like geographic location, gender etc. this may be useful too. If you don’t have these, you can create audience segments based on the UTM code from the link the user clicked to get to your site.
This will allow you to break your results down by users who had come from one search marketing campaign vs. another for example.
One thing to keep in mind with audience segments: when you look at an audience segment, you’re looking at only a portion of your total tested traffic and you may not have enough visitors to show statistically significant results.
Determine If You Should Stop the Test and Move On
Once you’ve reviewed your primary and secondary goals across all your key audience segments, you may find yourself asking, “Is this test done?” This question is particularly common if your results are inconclusive.
Some platforms will show the number of “estimated visitors remaining.” However, that number may fluctuate as your conversion rates change, complicating the answer to your question.
Enter the data you’re currently observing for your baseline conversion rate. Then, adjust that number to what you believe are realistic “lift” percentages.
Let’s walk through an example:
You have a 3.5 percent conversion rate and are looking to detect a 5 percent lift. A sample size calculator may tell you that you need 200,000 visitors per variation.
If you’ve already been running this test for a month and only have 50,000 visitors per variation – and if you’re not willing to let your test run for another four months to reach 200,000 per variation – it may be best to move on to another test.
However, let’s say you know you’re about to do a major marketing push that will drive a lot more traffic or improve your conversion rate (perhaps a seasonal sale is about to start).
Either of these changes may affect the length of time you need to keep running the test and may affect your decision.
Wrapping It Up
A/B test results can be complicated but your process for reviewing them should be systematic and simple.
Prep by asking a standard set of questions (you may even want to write the answers down), reference this guide or your platform’s knowledge base if there are any terms on the results page you don’t understand, and follow the four review steps (review of primary goal, secondary goals, audience segments, and test duration).
Do this regularly, and you’ll be on your way to getting the most out of all the A/B testing data you’ve gathered.
What experiences have you had with your test results? Any horror stories or successes to share? Are there any parts of your process that have been particularly helpful?
Let me know what you think about our post and as always, if you have a question about your test results, I’d be happy to take a look!