Power Analysis & Sample Size Estimation

Performing power analysis and sample size estimation is an important aspect of experimental design. Without these calculations, the sample size may be too high or too low. If the sample size is too low, the experiment will lack the precision to provide reliable answers to the questions it is investigating. In this case, it would be wise to alter or abandon the experiment. If the sample size is too large, time and resources will be wasted, often for minimal gain.

HOW

For each test, we will gather the following four data points before any code or resources are used. 

  1. What is the primary performance indicator (KPI)?
    • By how much do we believe our hypothesis will effect that KPI (effect size)?
      • Our standard can be 1%, 2% ,3%
  2. What is the acceptable minimum confidence level – 90%, 95%, 99% (significance level)?
  3. What is the acceptable minimum power level- 70%, 80%, 90%?
    • Often considered to be between .80 and. 90. 
    • Think of “Power” as the strength of the experiment. Statistical power is the probability that the test will detect an effect that actually exists.
  4. What is the current traffic size on the page being tested? 

WHY

With these data points, (effect size, sample size, significance level, power) we can enter three of the four quantities and the fourth is calculated. The basic idea of calculating power or sample size is to leave out the argument that you want to calculate. If you want to calculate power, then leave the power argument out of the equation. If you want to calculate sample size, leave ‘n’ out of the equation. Whatever parameter you want to calculate is determined from the others.

WHAT 

EXAMPLE Power Analysis for – Checkout – Guest Checkout

Hypothesis/Success Criteria: If we clearly call out the guest checkout option then we will increase conversion by at least 2%.

What is the optimal sample size for the given hypothesis? 

  • Sample Size (n)  = Unknown?
  • Effect Size (d) = 2%
  • Power = 80%
  • Sig Level (alpha/confidence level)  = 0.05 or 95%

Sample Size = 19,625

This tells us that we need ~20k sessions to reach 95% confidence to see a 2% increase in conversion at an 80% probability that the detected lift actually exists. If we do not reach a 2% increase in conversion at 95% conf. in the optimal sample size then we failed to reject the null hypothesis.

If we met the 2% increase in conversion rate at 95% confidence in ~20k then we would have rejected the null hypothesis. 

Power Analysis for – Checkout – Guest Checkout – Current Results

What is the Power of our current test results?

  • Sample Size (n)  = 30,946
  • Effect Size (d) = 1.8%
  • Power = Unknown?
  • Sig Level (alpha/confidence level)  = 0.12 or 88%

Power = ~90%

This tells us that there is a 90% probability our test we will be able to detect a change.

However, there is only an 88% confidence level in that change. 

What do we do? 

  • We could accept 88% as “good enough”. 
  • We could re-run our power analysis with a smaller effect size. This will increase the sample size needed. Continue running the test.