Or Experiment or Die?
My name is Jennifer Webster, PPC amateur and aspiring statistician. If there is both an art and a science to PPC management, my role here is to expand on how science - or at least statistics - can help you make the most of your campaigns.
I’ve spent most of the last week playing around with the Google Website Optimizer, and I’d like to start out my contributions here with some thoughts on website optimization and experimental design. First off, the Website Optimizer is yet another well-designed Google product with enough features to keep math geeks like myself up many nights playing and exploring.
Over on the Google Website Optimizer blog, the taglines include “Experiment or Die!” and “Always Be Testing.” Clearly, Google is pushing for more campaign optimization, although it is notable that most of the blog entries are written by Google Authorized Consultants, who make their living running website optimization experiments, so perhaps they have some motivation to get more people optimizing.
The most important part of any experiment is the design. A well designed experiment is an extremely powerful tool that may give you (or your client) a huge advantage in the marketplace. A poorly designed experiment is a meaningless waste of time. This “Experiment or Die!” mentality worries me. Experiment for experiment’s sake leads to what we statisticians call “analysis paralysis,” where you have so much data and so many, often contradictory, conclusions that you can’t make any useful decisions.
A key element of experimental design is power. Does the experiment you’re conducting have enough power to actually prove any thing? Power comes from two elements: the number of items that you’re sampling (unique website visitors) and the size of the effect. Its much easier to detect a large effect (like a 50% improvement in conversion rates) than it is to detect a very small effect (like a 1% improvement). In PPC website optimization, the limiting factor is your number of unique visitors. If you get millions of unique visitors, your experiments will have higher power than a site with a few hundred or thousand visitors per month, and you can experiment to your hearts content. With fewer visitors, you have to be a little more thoughtful about how you design your experiments
I started my analysis with a simple A/B split test, the most basic website optimization experiment. Assume you have two versions of your landing page. You drive traffic half your traffic to one page and half to the other and track conversion rates over a set number of unique visitors. Below is a graph of the power of that experiment assuming four different effect sizes: conversion rate improvements of 5% (green), 10% (blue), 25% (red) and 50% (black).

A finding that surprised me: if you’re only expecting a 5% improvement in conversion rates, 100,000 unique visitors only give you 25% power to detect the effect. That means two things: if one landing page really does convert better, you only have a 25% chance of detecting that, and if you see a difference between the two pages in your experiment, there’s only a 1 in 4 chance that the difference is real. So if you already have a well optimized sites and you’re constantly testing to improve 1% here and 2% there, it is likely that you’re wasting a lot of time and energy chasing after trends that may or may not actually be real. However, for a site that’s never been optimized or is being significantly overhauled and is expecting improvements on the order of 25-50%, those tests are generally well powered with only 10000-25000 visitors. Predicting effect size is really just an educated guess based on what you already know about your site and your campaign.
Moving from an A/B split to a multivariate design only compounds the problem. In a multivariate design we take two or more elements of the campaign and test two or more versions of each. As a test case, let’s take the two landing pages from the previous example and add different versions of the ad copy. Now, rather two variables (Landing Page 1 vs. Landing Page 2), we have four:
• Landing Page 1, Ad Copy 1
• Landing Page 1, Ad Copy 2
• Landing Page 2, Ad Copy 1
• Landing Page 2, Ad Copy 2
And the power curves look like this (50% - black, 25% - red, 10% - blue, 5% -green). Note that the scale for number of visitors now runs from 0 to 1,000,000.

The practical question that comes out of all of this is how many visitors do I need before it’s worth my time to do an optimization experiment? In genetics, we consider an experiment to be well powered if we have 80% power to detect the given effect. In the table below are some rough estimates for the number of unique visitors necessary to provide 80% power for a variety of different optimization tests and effect sizes. Keep these numbers in mind as you’re designing your optimization experiments. Carefully select the variables you’d like to change, and let the experiment run long enough to give you meaningful results.

Next week, more on the fundamentals of split testing and multivariate testing.