Sequential testing snout and spin

1/3/2024

The significance level determines the probability that we wrongly conclude that there is a real effect in our experiment. When setting up an experiment, we set our desired false-positive rate (or Type-I-Error rate) using the significance level. Unfortunately, continuously monitoring our test statistics (or peeking) increases the probability to refute the null hypothesis although there’s no real effect. Monitoring the p-Value over time (Image by Author) We would reject the null hypothesis every time we peek and the p-Value hits the significance threshold (marked by the red dots): Let’s assume we peek 8 times before we obtain the required sample size n, depicted by the black and red dots. The p-Value fluctuates over time as we get more samples. One might be tempted to monitor the p-Value continuously and refute the null hypothesis as soon as it is below our significance threshold. Calculate the p-value and reject the null hypothesis if the p-value is below the chosen significance threshold.Run the test and collect n samples per test cell.Calculate the Minimum Required Sample Size n.Set parameters such as Significance Level, Power Level and Minimum Detectable Effect.This is due to the nature of fixed sample size tests and how they are conducted: Not checking the results before the minimum required sample size is obtained is one of the fundamental principles of AB-testing. In both cases ending the experiment early would have a positive effect on our revenue. If on the other hand, the design compels more users to finish their purchase, we would also like to stop the test as early as possible so that all users can be exposed to the new design. In case the design harms the Conversion Rate, we would like to stop the experiment as early as possible to prevent any further losses in revenue. Let’s assume we are testing a new design on our website’s checkout page. We want to peek to minimize the harm of bad tests and maximize the benefits from good test cells. Monitoring an AB-Test in Optimizely (Image by Author) Why we want to peekĬollecting enough samples can take weeks or even months, and there are good reasons why we want to check our results earlier (also known as peeking).

0 Comments

Sequential testing snout and spin

Leave a Reply.

Author

Archives

Categories