## @kylerush

Director of Frontend Engineering and Optimization,
Hillary for America

# Significance

The risk of encountering a false positive.

95% = out of 100 a/a tests, 5 will inaccurately report a difference.

# Power

The risk of encountering a false negative.

80% = out of 100 a/b tests,
20 winners will not be reported.

# one-tail vs. two tail

One tail = is the variation better?

Two tail = is the variation
better OR worse?

# MDE

Minimum detectable effect

Successful testing strategies
are based around the minimum
detectable effect (MDE) variable.

# Sample size

How many subjects
are in your experiment.

Always use a sample size
calculator to calculate sample
size before starting an a/b test.

bit.ly/VUBti8

bit.ly/SWR3YC

# Example

Absolute lowest MDE

# 2,972,435

Visitors per branch

# example

Focus on time

1 month = 170,000 unique visitors

# 83,230

Visitors per branch

# example

Small startup

1 month = 3,000 unique visitors

# 1,567

Visitors per branch

Sample size calculators tell you
how many subjects, but not which
subjects should be in your experiment.

# sampling

• can be really hard
• week day vs. weekend traffic
• campaign vs. organic traffic
• returning vs. new visitors

# example

E-commerce website

# homepage assumptions

1. Lots of traffic
2. Relatively few conversions

Let's estimate:

• 2.5% conversion rate
• 100,000 monthly unique visitors

# Checkout page assumptions

1. Lower traffic
2. Relatively high conversion rate

Let's assume:

• 50% conversion rate
• 10,000 monthly unique visitors

# MDE: 5%

## just cut your MDE in half!

Start by focusing a/b tests on the
last step in your conversion funnel.

# what should i test?

Depends on MDE and time.

bit.ly/1wkpgye

bit.ly/1kU4sZ0

Terse vs verbose

# goals

• Measure as many goals as possible
• micro: form field errors, time on page
• macro: purchase, revenue
• Choose a primary goal
• Don't forget about down the funnel goals
• Repeat purchase
• Save payment information

# example

Success vs. submit

# 25% difference

Measure as many goals
as possible for every experiment.

# standards document

For each page/funnel record:

• Three month monthly average of unique visitors
• Stopping conditions (sample  size)
• Goals
• baseline conversion rate
• MDE
• visits per branch
• baseline conversion rate over time

bit.ly/1oVRf6i

# Quality assurance

• No bugs in the variation
• No bugs in the control
• Tracking works correctly

# ELIMINATING bias

Double blind experiments

# Experiment brief

• Hypothesis
• Audience description
• Goals tracked
• Stopping conditions
• Screenshots
• QA summary

bit.ly/1nvNJjx

# statistical tie

Not enough data to
conclude that there is a difference

The overwhelming majority of
experiment results are a statistical tie.

Statistical tie

# Example

Retesting

## first test

+36% revenue

## second TEST

statistical tie

# Share your results

Always, always record detailed

experiment results in an archive.

# EXPERIMENT ARCHIVE

• experiment date
• audience/url
• screenshots
• hypothesis
• results
• link to experiment
• link to result csv

# Experiment archive template

bit.ly/1q9tRWI

## @KYLERUSH

Head of Optimization, Optimizely

By Kyle Rush

# Copy of Architecting great experiments

Unbounce Roadshow. New York, NY. June 1, 2015

• 5,027
Loading comments...