THIS ARTICLE WILL HELP YOU:
- Turn your experiment results into future action
- Build on top of winning variations and set them live from the Results page
- Learn from losing variations
- Iterate on inconclusive experiments
Once you’ve analyzed your results, you’re ready to take action. This is the big moment!
Winning, losing, and inconclusive variations are opportunities to make decisions about your business, through data. If you find winning variations among your experiments, you’ll decide which changes to publish to your site — and how.
Losing and inconclusive variations present another set of valuable opportunities: to learn from your results, hone in on expectations that your site is failing to meet, and conduct proof-of-concept tests before committing resources to an unproven idea. These types of results may not surface quick wins, but they focus your testing and keep you oriented toward long-term success.
This article provides a playbook for effectively iterating on winning, losing, and inconclusive experiments. Use it to turn your data into action.
If you’d like to take a step back and dig deeper into your results, check out this article on interpreting your data.
Definitions: winning, losing, and inconclusive
What’s a winning, losing, or inconclusive variation?
Winning: When at least one experiment variation shows statistically significant positive difference (% Improvement) from the baseline conversion rate for the primary goal, but potentially for other goals as well.
Losing: When all experiment variations show statistically significant negative differences from the original, for your primary goal and potentially other goals as well.
Inconclusive: When the performance for all variations is relatively equal, showing no statistically significant positive or negative results for your primary goal.
Below, we suggest how to take action on each of these types of results.
One winning variation
When you’ve finished running an experiment and see a single clear winner, you can push that experience live to all visitors immediately.
Click the Launch button to set the variation “live” and allocate 100% of future traffic to the winning variation for that goal.
Then, work with a developer to implement the variation permanently on your site.
Congratulations! You can now test and iterate on top of these new changes to your site.Note:Clicking Launch won’t change the experience retroactively for visitors in your experiment who have already seen the original or another losing variation — it only applies to future traffic. If you want absolutely every visitor to see your winning variation, you will need to pause the running experiment, duplicate it, then set the traffic allocation in your new experiment to pause all variations except the winning one.
Here are a few ideas for what to do next.
RIPE FOR EXPANSION
Experiments that test messaging, imagery, and content prioritization are naturally good candidates for expansion. Translate insights from these wins to other areas of your site, or even to other domains if your testing program spans multiple sites.
Imagine that you’ve improve conversions by a “hassle-free checkout” value proposition to the end of your checkout flow. Based on the success of this variation, you might test whether adding similar language even earlier in the funnel, or on the Home page, would increase Add-to-Cart clicks.
READY FOR MORE
Sometimes, winning test results present an opportunity to keep optimizing along a trend. This might be the case if your experiment generated an unexpectedly large improvement or if a dramatic, structural change (versus a small, cosmetic tweak) produced a winning result. If this is the case, continue to mine the winning variation for opportunities.
For example, imagine that you’ve added a case study above the fold to your Home page to increase trust in your product. A large lift in this variation indicates that social proof positively impacts customer behavior. Next, you might try adding a video about a success story above the fold, to provide even more social proof.
MOVE ON TO OTHER IDEAS
Some tests are more prone to see diminishing returns from continued mining than others. For example, a button color that’s been optimized from low to high contrast may not have show much further impact on user behavior.
There are a couple of situations where implementing a winning variation may not move the needle in the right direction for your business.
If the variation doesn’t help your business goals, you may want to hold back. For example, imagine that you run a test to optimize your conversion funnel. The winning variation increases conversions on the product details page, but overall submissions stay the same. Before you implement this variation, step back to consider the bigger picture. Are there opportunities to optimize further down the funnel?
If so, consider running a second test to improve conversions down the funnel before committing to permanent changes.
If your changes result in a misleading message, and visitors who click through immediately bounce, don’t put that change into production. It’s unlikely to make a true impact on your most important metrics. Worse, you may create a perception that your site presents misleading propositions. And you risk slowing down future tests on the funnel. Visitors who click through with no possibility of purchase increase the noisiness of any signal in the funnel — making it difficult to achieve statistically significant results.
If you set goals to track your funnel, a pattern may emerge to indicate that you’ve optimized for a metric that doesn’t align with overall your business goals. Read more about that scenario in this article on interpreting patterns.
If you ran your test during tumultuous time, try it again before acting on the results. External factors may lead you to optimize your site for a certain type of visitor or situation.
For example, if you’re a job search site, tests run during graduation season may skew results towards new college graduates. That optimized experience may not perform as well for your average visitor, year-round.
Segment your results to gain insight on whether a major external event factored into the win. Consider running the test again to account for seasonality and certain traffic spikes. Leverage the insights you gain from your analysis of visitor types and external events in personalization.
Multiple winning variations
Sometimes (if you’re lucky), you see multiple winning variations in a single test. This is a good outcome, but what do you do with all of these options?
COMBINE THE IDEAS
Try combining multiple winning variations into a single optimized experience.
Imagine, for example, that you’re a property company testing multiple variations of a contact form. In Variation A, you removed several fields and saw a 60% lift in submissions. In Variation B, you added pricing information to encourage submissions and saw a 67% lift in submissions. Both are winners, so you decide to run a second round of tests.
In Variation C, you remove more fields from the form and add pricing information. You also include a picture of the property. Variation D includes pre-filled fields, since you know removing fields helps create lift. Variation E replaces all pricing information with a giant picture of the property.
By combining your winning ideas, you optimize and iterate based on multiple data points at once. When you’re trying to test what resonates with visitors the most, mix existing themes in new combinations to find the best experience.
EXPAND ON ONE IDEA
If you see a promising trend in one of your variations, consider taking a little time exploring that opportunity.
For example, let’s return to the property company contact form example above. Imagine that Variation C, which includes a large picture next to the pricing information, increases conversions. Try expanding on that idea by adding even more pictures to the form field: an exterior shot, photos of the interior, pictures of amenities, and more.
IMPLEMENT THE HIGHEST-WINNING VARIATION
When speed is a concern and you want to implement results quickly, you can choose the highest-winning variation and push it live. But before you do, segment your results to check how your most valuable customers respond to the change.
If those important visitors prefer the next-winning variation, weigh the business impact of those two choices. Or, personalize that experience for that valuable audience.
Some teams try to avoid losing tests, but experiments with losing variations actually aren’t bad. They’re often just as actionable as winning variations, and they provide focused and valuable insights about your visitors’ behaviors that help to guide your optimization efforts in the long run. The most valuable result of A/B testing is learning more about the customer.
Pay attention to your losing variations. Dig deep into the results of losing tests to find out:
- Why do visitors who see this experience convert less often?
- Do certain visitors convert significantly better (or worse) than others?
- Can you use this data to brainstorm new hypotheses about your how visitors respond to your site experience? What can you optimize, based on this information? How can you leverage this insight?
Use what you’ve learned to plan a new, bolder experiment or make major decisions about the direction of a redesign.
Here are a few more ways to handle a losing variation, after you analyze it.
TRY IT AGAIN, WITH SOME ATTITUDE
If you’ve generated a statistically significant result, even if that result is negative, you’ve still shown an ability to affect your visitors’ behavior with your experiment. Can you change the execution of the experiment to turn that negative into a positive? This is especially true for tests with clear alternative approaches, like images, colors, headlines, layouts, etc.
Sometimes, the thing you’re trying to test is already relatively “optimized” (though no site is ever completely optimized). Continuing to test the same thing will not only generate more negative results, but you’ll also encounter an additional opportunity cost for not testing something with clearer potential to generate positive results. This might be true if you’ve already iterated multiple times on the same experiment idea, or if you’ve been running tests on one area of your site for an extended period.
WHEN IS A LOSER ACTUALLY A WINNER?
Sometimes, you’ll have the case where a losing variation actually drives advantageous business results; for example, if you test a prototype of a costly redesign, and it loses, you can avoid committing to the full project. Imagine that a website based on lead generation plans to redesign the entire front page around a new form that suggests visitors “request a demo.” Before they design and implement the new concept, they run a proof-of-concept test that simply surfaces the form in their current layout. A dramatic drop in submissions reveals that visitors prefer to research and explore the business before entering information. With one test, this company saved an entire quarter of development time. Prototyping and testing can generate ROI by avoiding costly investments.
Sometimes, when you check your experiment at the end of the projected time-to-results, your test hasn’t reached statistical significance. You know that inconclusive tests can provide valuable information. Waiting longer will help you gather more data — but how long should you wait? What’s the potential impact of this test, compared to the next experiment you might run?
Inconclusive tests are the most confusing type of test result. But they also generally point in a clear direction: go bigger!
The fact is, many test executions simply aren’t dramatic enough to make an impact on results. The biggest hurdles for testing more significance changes are organizational fear and a lack of resources.
Here are a couple of ways to go bigger.
TEST MORE THAN ONE ELEMENT AT A TIME
Although you may sacrifice some empirical rigor by testing more things at once, your single greatest responsibility is to move the needle; you can focus on getting better at establishing why things are happening once you’ve gotten that lift you’ve been looking for.
INCREASE THE “DEGREE OF DRAMA”
Make your variations clearly, visibly different. Making stronger attempts at changing the environment will yield proportionally stronger results. For example, imagine you test putting product ratings above the fold but your results are inconclusive. You also test a CTA above the fold, with similarly inconclusive results. You can increase the drama by putting both features above the fold to see if this experience resonates with your customers. You can also make the testimonial very large — the only thing customers see above the fold. Dramatic changes such as these give you clearer insight into your visitors’ preferences.
In certain situations, you may wish to take action on an experiment that’s run for the projected time but hasn’t reached significance. This is a business decision you’ll make based on the Results page data and your business goals. Below, we discuss a few tactics that may help.
CHECK VISITORS REMAINING
Visitors remaining estimates how long your experiment needs to run to reach significance, based traffic, conversion rate, and number of variations. If your conversion rates for the original and variation stay the same, you’d need x number of visitors for Stats Engine to call a significant result.
But, if the conversion rates start to change, your visitors remaining will also adjust. A ballpark estimate based on visitors remaining will help you make a business decision on how long to wait.
To decide whether to keep running the test, compare the visitors remaining to the number of visitors projected in your test plan.
Wait for statistical significance if:
- You haven’t reached the number of visitors predicted in your test plan
- Your visitors remaining suggests you don’t have long to wait
Declare the test inconclusive if:
- You’ve exceeded the number of visitors you planned for and visitors remaining suggests you won’t reach significance anytime soon
Imagine, for example, that you want to be confident that the 5% lift you see is real. And you’d likely have to wait another two weeks to gather enough data to say that the likelihood that lift is false is less than 5% (or 95% statistical significance). You could keep running the test — but it’s probably not worth it. Together, the data for visitors remaining, your test plan, and the lift you expect to see already indicate that the change you made did not change your visitors’ behavior in a significant way.
At this point, it may be best to segment and check your secondary and monitoring goals to look for insights for the next round of tests.
When visitors remaining doesn’t provide a ready answer, check the difference interval to help you interpret your results.
CHECK THE DIFFERENCE INTERVAL
The difference interval can help you decide whether to keep running a test or move on to another idea. The interval always straddles zero for experiments that haven’t reached significance yet.
In the example above, Stats Engine isn’t sure at 90% statistical significance whether the difference in conversions for the “Pop-Up” variation is positive or negative. However, it can currently determine statistical significance to 78%. It would need ~100,000 more visitors to be conclusive at the 90% level. As Stats Engine gathers more data from visitors’ behaviors, the confidence band expands and contracts. The more erratic visitor behaviors are, the less sure Stats Engine is about future results — and the interval widens. A narrow difference interval indicates that Stats Engine is honing in on the likelihood that the variation will convert at the same rate in the future.
If your results are inconclusive, your difference interval can provide insight when paired with visitors remaining.
For example, an inconclusive test that with a difference interval from +0.2% to +0.4%. Stats Engine calculates 6,500 visitors remaining before you reach significance. Is it worth it to keep going?
Along with visitors remaining, the difference interval can help provide insight about how to make a business decision based on your results.
SEGMENT YOUR RESULTS
Segments are one of the analyst’s most important tools for digging into overall performance metrics. Your total performance can be thought of as an average population, but different sub-groups may have different goals — and different conversion rates. Dig into your segments to discover whether there’s a group that responds to your experiment differently from the average population. If there is a segment that responds differently to your experiment, that may lead you to develop a personalization strategy.
A few final notes:
- It takes work, but make sure that you have a clear hierarchy of goals upfront. Understand that one goal may increase at the expense of another, so you may need to make tradeoffs for the variation that moves the more “important” goal.
- If your experiment has many losing and winning variations or goals, and you don’t have a clear hierarchy of goals, the net effect may be that you still label the experiment as “inconclusive” because the performance was too contradictory.
- The final goals of your program, like ROI, may not be the best metric to judge an experiment if that experiment occurs further in the funnel from conversion. For example, for an e-commerce site, checkout revenue may not be your most impactful goal if you’re testing on your homepage.