How to beat the recession with innovation, not stagnation: why now is the time to ramp up your organization's experimentation effort
Generating meaningful results through CRO and experimentation is tough.
While almost anyone can achieve the odd one-off conversion rate uplift, producing wins with any consistency is another challenge entirely.
What’s more, many CRO practitioners find that even when they do achieve the kinds of results they’re after, their winning variations fail to perform when served to 100% of their traffic.
Thankfully, in our experience, almost all CRO-related difficulties can be traced back to a number of core mistakes. In this post, we’re going to run through a list of the most common (and harmful) ones, explaining what they are and how you can avoid them.
If we’ve done our job right, then by the end of this blog, you should have everything you need to sidestep all of these pitfalls and start driving real, replicable results through CRO and experimentation.
So, to begin: this is our list of mistakes (in no particular order!):
- Starting out too big
- Average build size is too big
- Tests are too small
- Chasing winners
- No hypothesis
- Statistical misunderstandings
- No research
- The flicker effect
- Not tracking guardrail and secondary metrics
- Wrong primary metric
- Over-reliance on best practices, under reliance on testing
- Not digging into segments
- Little or no quality assurance
- Noise of existing customers
- Sitewide redesign
1. Starting out too big
As a general rule, it’s never good to start an experimentation program with an ambitious, resource-intensive experiment. Here’s why:
A few years ago, we began working with a new client – a property listing website. Some early research indicated that their property pages might benefit from the inclusion of a maps feature, which would allow their users to see where each property was geographically located on a map.
This functionality was quite complicated to build, and required a lot of time spent on development and QA to ensure that everything was functional and user-friendly.
Unfortunately, when we finally launched our experiment, we found that the variation actually produced a negative impact on our primary conversion metric. In fact, many users were actively navigating away from the maps view that we’d created!
This is just the way it goes sometimes. You can never be sure of the impact a change will bring until you’ve tested it. But the mistake we made was sinking too much time and energy into testing a hypothesis that could have been tested using a much simpler, much less resource-expensive experiment.
For example, we could have used a painted door test to gauge demand for this new maps feature. This would have been a much quicker experiment to build, and it would have given us everything we needed to validate – or invalidate – our hypothesis.
If we then found that lots of users were trying to use this feature, we could have gone on to build out the functionality with a reasonable degree of certainty that it would actually improve engagement on the site.
As a result of this mistake, we’ve since adopted the concept of a minimum viable experiment (MVE) to help guide our experimentation. In essence, an MVE is the smallest possible experiment – in terms of build, design time, etc. – that will allow us to validate our hypothesis.
We now use MVEs to validate our hypotheses at minimal cost and risk, and to gather information about our users which we can use to guide the future course of our programs.
We’re not saying ‘don’t spend time on big builds.’ We’re saying ‘only spend time on big builds when you have a reasonable degree of certainty that those builds are going to be worth it!’
2. Average build size is too big
This may sound like a similar point to the one raised above, but it’s slightly different. The last point was about not creating experiments with big builds at the start of your program, when you don’t have much data to support them. This is about creating experiments with big builds in general.
Many people within the CRO industry subscribe to the view that ‘the bigger the build, the bigger the uplift.’
The idea behind this is fairly simple: if I make big changes to a web page, the conversion uplift is likely to be bigger than if I only make small changes.
Anecdotally, though, we’ve always felt that our smaller experiments yielded results that were just as strong as our larger ones – so we decided to dig into our database, made up of thousands of experiment results, to see what it could tell us.
The chart below shows our findings.
As you can see, tweaks have just as high a win rate as experiments with a large build-size – and they have a slightly greater average uplift too (6.6% vs. 6.5%).
This data shows that there’s no correlation between build size and either win rate or uplift – so if you’re spending all of your time building huge experiments that you hope to generate an equally huge uplift, you’re probably going to waste a lot of time.
3. Tests are too small
While it’s important to not spend all of your time focusing on huge experiments, it’s also important that you don’t spend all of your time testing minor tweaks either.
As discussed in the next section on chasing winners, experimentation gives you a chance to trial some of your boldest and brightest ideas – ideas that have the potential to completely revolutionise the way your business works.
If all of your experiments are focussed on minor tweaks, you’re missing out on one of the biggest opportunities that CRO offers: taking risks with a safety net.
Ideally, your program will be a combination of small, low-risk tests with a high probability of winning, and higher-risk tests that have the potential to fail horrendously or succeed spectacularly.
4. Chasing winners
Following on from our last point: conversion uplifts are important, for sure, but when done right, CRO should also be about gathering deep insights about your customers and trialling bold, innovative ideas with only a fraction of the usual risk.
As an agency, therefore, if we’re winning too many experiments, we start to ask ourselves if we’re being bold enough.
A high win rate may look good on paper, but we see it as an indication that we’re being too safe, putting ideas into action that we already have good reason to think will work.
The most value from CRO comes when you learn things that you didn’t already know – this allows you to start achieving bigger, more surprising wins, which you can later use to inform not only your experiment roadmap but also your product, pricing, and business strategies too.
5. No hypothesis
Many people doing CRO today simply run their tests, analyze the results, look at whether the challenger variation won or lost, and then move on to the next experiment.
On the one hand, these people should be commended for the fact that they’re running experiments and basing their decisions on empirical evidence. But on the other, their process is missing one of the most important elements of any sound scientific methodology: a hypothesis.
Put simply, every test should be designed to test a hypothesis.
This way, even if your test loses, you’re at least learning something, i.e. that your hypothesis was wrong. You can then use this learning to inform future experiments with an improved chance of success.
As will hopefully be obvious by now, high-level CRO is as much about learning as it is about improving your conversion rate. Creating data-backed hypotheses and then testing them is the key to achieving long-term success.
6. Statistical misunderstandings
Confusion surrounding ab testing statistics is a cause of much CRO-related difficulty.
For example, many people call (i.e. finish) their tests as soon as they’ve reached 90 or 95% significance.
Mats Einarsen showed why this is a bad idea.
He simulated 1000 A/A tests (where the control and the variation are identical) and found that 531 one of them reached 95% statistical significance at least once!
What this shows is that if you stop your experiment as soon as it reaches a certain significance level – even if this level is set at 95 or 99% – there’s a reasonable chance that your result will be the product of blind luck.
To avoid this mistake, you need to determine your required sample size before you’ve even launched your test – and you need to stick to it.
Here’s a good calculator you can use to calculate the sample size you need for your experiment.
And here’s a good starting point to help you learn a bit more about ab testing statistics in general.
7. No research
Having been convinced of the value of CRO and experimentation, the next challenge is deciding what to test.
Should you change your hero image? Should you make your headline copy more emotive? How do you decide which of these ideas to test? What’s more, how do you determine if either of them is worth testing?
To answer these questions, you need to do your research.
Research comes in many forms – analytics audits, scrollmaps, heatmaps, surveys, user testing, biometric testing, etc. – and it provides you with a good indication as to where and why your web visitors aren’t converting. With this information, you should then have a good idea of which kinds of hypotheses are worth testing and which ought to be pushed further down your list of priorities.
Ultimately, there are any number of potential hypotheses you might want to test on your website. By prioritizing those that are backed by multiple data points, with a mix of qualitative and quantitative research, you’ll be able to zone in on areas of testing that are likely to yield the biggest return.
8. The Flicker effect
Sometimes when you’re running an a/b test, the original version of your webpage will appear in the browser before your variant finally ‘flickers’ into place. This phenomenon is known as the flicker effect (or the flash of original content (FOOC)) and it can play havoc with your experiments.
Not only does it ruin your website’s user experience, but by showing your users both versions of your webpage – the control and the variation – it impacts the way they respond to your experiment, invalidating your results.
Thankfully, there are things you can do to minimise or entirely remove the flicker effect.
Generally speaking, our developers write the code for our clients’ experiments with CRO specific standards in mind. This ensures that the code is executed as quickly as possible, accounting for any time issues, and that the user only sees the version of the webpage that they’re supposed to.
9. Wrong primary metric
Your primary metric is the metric that you use to decide whether or not your experiment is a winner or a loser.
If you have a website that sells shoes, you might set the number of orders as your primary metric. This way, if you run an experiment and it results in a 10% uplift in the number of orders, you’ll class it as a winner.
But what do you do if you’re optimizing a web page that’s a few steps away from your final conversion? For example, maybe you have a four step funnel and you want to optimize the first of these four web pages.
What should you use as your primary metric?
Some people will argue that in this case, you should select the next action you want your user to take as your primary metric, rather than the final conversion. So, in this example, every time a user proceeds from the landing page to the basket page, you would then count it as a conversion.
But in our experience, this choice of primary metric is a mistake.
That’s because sometimes, for a variety of reasons, you’ll find that your ‘next action’ conversion rate goes up but your ‘final action’ conversion rate goes down.
Take this real-world example:
We thought that by making the minibasket easier to use, we would increase the progression rate through to checkout, and that this would ultimately have a positive impact on our final conversion rate.
However, despite the fact that this experiment increased the progression to checkout rate by 28%, it also increased the dropoff rate on the checkout page by 43%!
This netted out at a 7.7% decrease in final conversions
As a consequence of this result and many others like it, we always recommend using your final conversion as your primary metric.
Note: There are a few occasions when it might actually make sense to set your primary metric as something other than your final conversion. If you’d like to learn about these exceptions to the rule, we discuss this in more detail in our blog post about primary metrics.
10. Not tracking guardrail and secondary metrics
Selecting the right primary metric is an important first step, but if you want to get as much out of your ab tests as possible, you should also be tracking certain guardrail and secondary metrics too.
This is something that many CRO practitioners fail to do, and it means that they’re leaving all kinds of important insights – insights that could be used to inform their future testing strategy – on the table.
Guardrail metrics are second tier metrics linked to key business objectives. They help you ensure that your experiment isn’t inadvertently harming other important business KPI’s.
Here’s an example of the importance of guardrail metrics taken from work we did for one of our clients, a camera vendor:
We introduced an ‘add to basket’ call-to-action (CTA) to the product listing page, which allowed users to make a purchase without having to navigate to the product page.
This test produced a positive uplift on our primary metric – no. of orders – but it had a negative impact on two of our guardrail metrics – average order value (AOV) and revenue.
If we hadn’t been tracking these guardrail metrics, we would have simply declared this test a winner and recommended that our client served this variation to 100% of their traffic – costing them a fortune in the process.
Fortunately, on top of tracking guardrail metrics, we were also tracking a number of secondary metrics too. Secondary metrics don’t determine whether your tests win or lose, but they do allow you to monitor things like engagement, scroll depth, secondary KPI’s, etc., to help you make sense of your result.
When we dug into our secondary metrics, we found that far fewer users in the variation were purchasing accessories and add-on items than in the control. This was because these users were being diverted away from the product page, which was where they were usually first exposed to these products.
Insights gleaned from these guardrail and secondary metrics not only allowed us to avoid rolling out a new version of the web page that would have harmed business objectives, but they also helped inform the future direction of our testing strategy.
11. Over-reliance on best practices, under reliance on testing
Conversion rate optimization, when done properly, is all about testing hypotheses and making decisions based on the best available evidence. Unfortunately, many people doing CRO today simply apply certain ‘best practices’ (e.g. CTAs should be in red) to their website, without ever testing whether those best practices are right for them.
As any good CRO practitioner will know, every website is different.
Just because something works well for some websites doesn’t mean that it will work equally well – or at all – for others.
Solely relying on best practices is a recipe for disappointment.
If you’re serious about optimizing your website, you need to be testing your hypotheses.
12. Not digging into segments
Sometimes, for a whole variety of reasons, one variation will perform well with one specific segment but not with another.
In fact, this is something that we as an agency see all the time.
A test variation will achieve, say, a 2% conversion uplift, but when we dig into the data, we find that the uplift on mobile was actually +12% while the uplift on desktop was -10%.
This kind of finding has real, quantifiable implications.
For example, what was it about the variation that mobile users responded so positively to? And why did desktop users respond so poorly? Can we build new experiments to iterate on these findings? Should we serve this new variation page to mobile users but leave the control in place for those using a desktop? Will doing so negatively impact the consistency of the user experience?
These are important questions that need to be answered, but unless you’re analyzing your results data and looking at your different segments, you’ll miss them entirely.
13. Little or no quality assurance
No matter how strong your research or how well designed your experiment, if your webpages aren’t appearing as they should be – or if they’re appearing differently on different devices, browsers, page resolutions, etc. – then your results are likely to be skewed.
One way around this is to invest in rigorous quality assurance (QA).
As an agency, we involve our QA engineers right from the start of our process, which allows them to familiarise themselves with each experiment long before it is finally launched. We also encourage them to question everything and to assume, by default, that there will be problems .
This is a fairly stringent process, but it ensures that our QA engineers are almost always able to catch bugs and usability issues long before the experiment goes live.
If you’re planning to start running QA of your own, Browserstack is a good place to start.
14. Noise of existing customers
Sometimes you’ll conduct a tonne of research and build a variation page that you’re confident will win, only to find that when you come to test it, the conversion rate has hardly moved and the result hasn’t reached statistical significance.
Well, sometimes, this is just the way it goes. No matter how much research you do, there’s no guarantee that the changes you make will produce their intended effect. This is why testing is so important in the first place.
But having said this, sometimes there’s another explanation:
Let’s say that 90% of the people on your website are existing customers. They’ve already been convinced of your product’s value, so when they visit your website, they’re simply there to reorder.
For these people, the changes you make are unlikely to have much of an impact on their behavior – they’re going to buy the product regardless of whether you add new imagery, change the headline, etc.
In situations like this, many people make the mistake of running their tests on existing customers and new users together. When existing customers make up a sizable portion of overall traffic, this has a tendency to muddy the waters and make it extremely difficult for you to achieve a definitive result.
Instead, you often need to find a way to isolate new users to ensure that they’re the only ones being included in your test.
One way of doing this is to only include new users in your sample (not always possible). Another is to select a primary metric that’s tied exclusively to new user activity – for example, account creations.
15. Sitewide redesigns
This one isn’t necessarily a mistake CRO practitioners themselves make, but it’s something that nonetheless damages conversion rates, so we thought we’d include it here anyway.
Over the years, many companies have come to us with the same problem: feeling that their website’s design had become dated, they’d decided to hire a team of designers and developers to build them something new. They’d spent hundreds of thousands of pounds ensuring that their new website was as sleek and aesthetically pleasing as possible, but when they finally launched it, they found that their conversion rate fell off a cliff.
Many of these redesigns looked great, but for one reason or another, they weren’t performing.
Ultimately, what’s the point of a fancy website redesign if it harms your bottom line?
When we hear that a company is planning to redesign their website, we like to offer an alternative approach: iterative redesign (also sometimes known as evolutionary redesign).
This approach involves making changes gradually, often one at a time, and running constant tests to see how they’re affecting the website’s conversion rates. If the changes perform well, we keep them and look to build on them; if they perform badly, we reject them and look to learn from their failure.
This is the method amazon uses, and it has allowed them to continually improve their website based on the best available evidence, with little or no risk.
Almost anyone can achieve a one-off conversion rate uplift on their website, but generating long-term results through CRO is extremely tough. We hope that this post will give you a good foundational understanding of where you might be going wrong with your CRO efforts and what you can do to start (gracefully!) sidestepping these pitfalls.
If you’re interested in learning more about how you can use CRO to achieve your business goals, we have a biweekly newsletter where we go into more detail on the various strategies, frameworks, philosophies, and approaches that we’ve used to generate more than £1 billion in additional revenue for our clients.
Sign up below!