The problem of Survivor Bias in Split Testing

If you are serious about improving your website conversion rate you have probably already explored Split Testing. Split Testing (also known as A/B Testing) is where you compare two different versions of a page to see which converts traffic at a higher rate. The problem with these kind of experiments is, there is a large possibility of Survivor Bias that will make you draw the wrong conclusions.

Split Testing is an important aspect of improving the strategy of a website or email campaign because when it is done right, it can be produce much greater financial returns.

However, if you are either drawing the wrong conclusions, or can’t detect and correct for Survivor Bias, you could be potentially wasting time down a fruitless path.

This is a guide to spotting Survivor Bias in Split Testing, and how you can ensure you don’t fall down that trap. I’ll also be expanding on the other things you need to be aware of when drawing conclusions from websites experiments and how to ensure you get the right results to improve your conversion rate.

What are the different types of Split Testing?

Before I get into spotting and avoiding the trap of Survivor Bias, first I’ll give a quick overview of the different testing methods you can use to create better websites.

Split Testing (A/B Testing)

Split Testing (or A/B Testing) is where you take two different versions of the same web page or email and show it to random selections of your audience to determine which version has the higher conversion rate. For example, you might have a sign up form that has a pricing box (How to design effective pricing tables) on one version, whilst on the other you might have customer testimonials. You could then determine which version has the higher conversion rate and therefore which feature is better for your sign up page.

Multivariate Testing

Multivariate Testing is basically many concurrent Split Tests running at the same time. So for example, you could be running tests on your content, images, layout or colours all at once.

Website testing works by randomly displaying one version of the website to each user. The user’s actions are then marked next to this particular version. Conclusions can be drawn by seeing which exact version of the website produced the greatest number of conversions.

What is Survivor Bias?

It is also important to understand exactly what Survivor Bias is so you learn how to spot it. Survivor Bias is essentially drawing conclusions where a certain condition is met because the experiment survived, and not because it shows empirical evidence to prove the theory.

For example, flipping a standard coin 50 times and getting the same outcome 50 times in a row is statistically very unlikely. If you were to try this experiment it would be highly improbable that you would manage to do it. But if you were to have a million people all conducting the same experiment at the same time, it is not inconceivable that one of those people would manage to do it. If you were that one person who managed to get 50 in a row, and you were isolated from the other experimenters, you would draw the conclusion that it is statistically very likely for it to happen.

Survivor Bias is where you draw the wrong conclusions because you neglect to include the results that didn’t “survive”.

Survivor Bias in Website testing

The problem of Survivor Bias in Website testing is more common than you think. When you are testing a website, there are many different factors that you need to take in to consideration when analysing your results. It is almost impossible to ever achieve a completely perfect scientific experiment because it is impossible to isolate any particular variable.

However, you can improve your experiments and use your knowledge of this problem to draw the correct conclusions or further explore an outcome before making a decision on your website’s future.

The following are things to be aware of when conducting Website testing, what to look for, and how you can interpret and use the results or conduct further experiments to find more evidence.

Analysing traffic – Not all traffic is equal

The first major issue of website testing is analysing traffic. Whilst traffic is usually only measured in visitors, the source of traffic is a much greater consideration than pure numbers. For example you might receive a large amount of visitors to your website through Twitter, and only a comparatively small amount of visitors from a industry blog post, but the two sources could product a vastly different conversion rates.

Do not count all traffic as equal. When analysing traffic and traffic sources, you need to go much deeper than the actual surface numbers.

Understanding traffic – User Personas

One of the most important aspects of analysing traffic sources is understanding the User Personas of your visitors. As mentioned above, a highly targeted industry blog post is much more likely to bring in visitors that are highly engaged and pre-qualified to purchase, whilst traffic from Organic Search results is likely to be much less likely to convert on the first visit.

I’ve written previously on How to develop User Personas for your website.

Understanding the Sources and User Personas of your traffic will enable you to make much better judgements on the outcomes of your split testing.

Prescreening traffic

Once you have a good grasp of the sources of traffic that are bringing you the best type of traffic to make conversions, it’s easy to subconsciously start prescreening your traffic during experiments. For example, you might create a landing page that is specifically target for a highly engaged email list. The traffic source for this page is already pre-qualified with purchase intent, and so it will blur the actual results of the experiment. This is a classic case of Survivor Bias. By running an experiment with a certain segment of traffic, you will likely gain a much better conversion rate, but it won’t help you when it comes to converting unqualified traffic and grow beyond the initial user base.

Small sample sizes

Another common problem of Split Testing is that a lot of online companies just don’t have enough traffic to really execute good experiments. Small sample sizes can easily give you the wrong conclusions by giving an outcome that would only of happened at that small size. In order to really get strong evidence, you will often have to leave your experiments to run longer than you would want to in order to gain more data. This might not have the quick turnaround that you are looking for, but it will ensure your results aren’t skewed from a lack of results.

Insufficient evidence

Linked to small sample sizes, but not necessarily the only cause, insufficient evidence can again be another area where you draw the wrong conclusions.

Say for example you set up a Google Adwords campaign and you run it for a month. After that month you find that a related key word phrase that you have not previously been attacking is your best performer.

Now you might think that you should double down on that key word phrase in order to continue it’s growth, but have you really drawn enough evidence to suggest that you have discovered a profitably new avenue?

There are usually a lot of different factors that influence the ups and downs of traffic and conversions, and it is unlikely that you can pinpoint an opportunity this quickly.

Insufficient evidence is a problem in an online world of greater data transparency than we’ve ever seen. It’s easy to look at Google Analytics and start drawing conclusions, it much harder to find the evidence that really matters that is not so easy to find.

Large numbers of concurrent experiments

Multivariate is a great way to test a large number of variables very quickly. When set up correctly, you can be gaining data on exactly what the best combination of layout and content will work best for your website, and you can find that combination as quickly as possible. It also allows you to continue to push ahead with further experiments to continue to optimise your pages for even greater performance.

Multivariate testing is perfect for the likes of Amazon and Google who have millions of users every single day. It’s probably going to be less useful for your company website that gets a couple of thousand visitors.

If you are going to be testing many different variables at the same time, you will suffer from sample sizes that are too small and insufficient traffic. It is much better to focus on one area at a time in order to optimise slowly, but more effectively.

The different types of Calls to Actions

Another common misconception with Split Testing is that all Call To Actions are the same. Signing up to a email list and making a purchase are two completely different things.

When optimising a page, you should be optimising for that very specific Call To Action. There is no point using the evidence that was collected from an email sign up page to inform the decisions around a purchase page because they are entirely two different actions.

I often hear people say that “changing X will improve conversions of Y”, when in reality, no two experiments can be the same if the traffic, product, or Call To Action are in any way different.

Keep your different experiments separate and only make decisions with data from that very specific experiment, and not from some other experiment that was in some way related.

Measuring the wrong things

I think just about everyone is guilty of looking at the wrong things the first time they start using Google Analytics. For the first time, you are presented with a wealth of data about the performance of your website.

Measuring the wrong things can be forgiven in the early days, but you need to soon realise what you need to be measuring and what you need to be ignoring.

I think a general, but important rule to remember is, “only measure and optimise one thing”. If you try to improve many different areas of your website you will likely fail because it is too difficult to put enough time and attention into multiple areas.

Instead, find the one thing that will greatly improve your results and have an actionable outcome for your business and concentrate on that.

For example, you might find that people will only purchase your product once they have visited your blog multiple times. Once you have this goal you can optimise your blog and your sales funnel to encourage people to keep coming back or to read multiple posts in a single visit. This is a much more achievable target than trying to optimise every single Call To Action or Landing Page across your entire website.

Qualitative feedback

The final problem is Qualitative feedback. Qualitative feedback does not really have a place in Split Testing, but it can derail or confuse your experiments. Say for example, you find that by removing a section of your landing page entirely, you can increase conversions by 30%. Qualitative feedback can derail this optimisation when you receive feedback that is contrary to your decision.

You should always listen to qualitative feedback from your customers because it is an important indicator for what you should do next. I think it’s important to listen to this kind of feedback, but make decisions based on your user’s actions, rather than their thoughts.

Qualitative feedback can shed light on the bigger picture, or allow you to empathise with your customers situation, it should not determine the strategy of your business.

Survivor Bias in Split Testing

As you can see from the examples above, Survivor Bias can creep into a number of different areas of Website testing and experimentation. By neglecting failed evidence, or drawing conclusions with insufficient evidence, you can end up making decisions that hurt your website, rather than improve it.

The nature of website testing is that there will never be a perfect opportunity to fully prove or disprove whether a certain feature, design or layout is the “right” one to choose. But with a good knowledge of the randomness of experimentation, Survivor Bias and spotting areas of weak evidence, you can make better and more informed decisions to improve your website in the long run.

You might also find the following posts useful:

The problem of Survivor Bias in Split Testing
Share this