What is an A/B Test?

6 min read time

Moving throughout life, it can be difficult (often outright impossible) to tell what causes anything. Why is it harder to get out of bed today? Was it because you drank less water? Went to bed later? The stress from yesterday's work?

Companies face this same problem when analyzing their metrics. If you run a site as complex as Amazon, with countless moving parts, how can you tell if something you've changed is beneficial or harmful?

Luckily for medium to large websites, there is a solution.

While you can't go back and change what you ate for dinner yesterday, Amazon can simulate "two different days" on their site. They randomly show half their customers one version of the website, and the other half a different version. If they have enough traffic, they can determine which version performs better. This is the fundamental idea behind A/B testing.

Large internet companies take this idea and run with it. Running hundreds or even thousands of tests per year to evaluate whether every little change they make is worth it. Here's a simple but concrete example of how one of these tests might work and the general process of creating an A/B test.

Starting with a hypothesis

Almost all A/B tests start with an idea.

Let's say a product manager at Amazon is looking at their checkout page. What they notice is that when an item has a free return policy, that information is tucked away in the "Additional Information" box at the very bottom of the price details. This product manager has heard a lot of their friends say they really like to buy items that have free returns, because it makes the purchase feel safer, especially if they are buying online.

That product manager has now formed a hypothesis, an idea that they think is true, that more people will buy it if the "Free returns" label is moved to a more visible spot, right next to the product's price.

Designing the test

Now, with a hypothesis in hand, the product manager moves on to designing the test. Test designs can get complicated, but 90% of them are relatively simple and follow a general pattern. They are always important because you want the test to achieve statistical significance. In other words, knowing that one version didn't just get lucky, but is genuinely better.

Often, the first step is to find a data scientist, or a test duration calculator (maybe we'll build one into our site; they're easy to make) to estimate how long the test needs to run to reach statistical significance. To do this, they typically need two key numbers:

The number of people who visit the page
The normal conversion rate (i.e., the number of people who buy divided by the number who visit)

So really just two numbers.

Running and analyzing results

Once they know how long the test needs to run, the team can launch it. They set up the two versions of the page and start randomly sending customers to each one. One important technical detail here is that customers are bucketed, meaning they'll see the same version every time they visit.

After the designated time has passed, they check the results. There are three possible outcomes:

Negative: The new version performed worse
Positive: The new version performed better
Inconclusive: The test wasn't sensitive enough to verify any difference between the variants

Let's say in this case the product manager was onto something, and the test comes back positive. Then the change gets rolled out to all customers, and this new version becomes the default.

A/B testing doesn't just help companies like Amazon improve conversion rates, it gives them a way to learn in a world full of uncertainty. We can't always control the variables in life, but online, with enough traffic and a good hypothesis, we can at least run the experiment.

What's an A/B test?

Starting with a hypothesis

Designing the test

Running and analyzing results