A/B Test Significance Calculator

Compare two A/B test variants with conversion rates, lift, p-value, confidence, winner guidance, and distribution charts.

Variant A

Variant B

Expected distributions

Significance

Significant

99.98% confidence that conversion rates differ. Variant B is ahead in this sample.

Relative lift
24.0%
Absolute difference
+1.20 pts
Confidence
99.98%
p-value
0.0002
z-score
3.69

Conversion rate range

A
5.00%
B
6.20%

How to use this A/B test significance calculator

  1. Enter visitors and conversions for your control variant.
  2. Enter visitors and conversions for the challenger variant.
  3. Review the lift, p-value, confidence, leading-variant note, and conversion-rate range.
  4. Use the expected-distribution chart to see how much the two estimates overlap.
  5. Treat the result as one input alongside experiment design, tracking quality, and business impact.

A/B Test Significance Calculator features

  • Compare two A/B test variants from visitors and conversions.
  • Calculate conversion rates for the control and challenger.
  • Estimate relative lift and absolute percentage-point difference.
  • Run a two-proportion z-test for conversion-rate experiments.
  • Show p-value, z-score, confidence, and leading-variant guidance in one result panel.
  • Visualize estimated conversion-rate ranges in the result panel.
  • Visualize expected conversion-rate distributions for variants A and B.
  • Flag invalid inputs and small expected-count cases.
  • Calculate everything in your browser without uploading test data.

What statistical significance means in an A/B test

Statistical significance estimates whether the observed conversion-rate difference is larger than you would expect from random variation alone. It does not prove that a result will hold forever, and it does not measure whether the lift is valuable enough for the business.

Use this calculator after the test has collected enough clean, randomized traffic. Avoid repeatedly checking early results and stopping the test the moment the calculator shows a favorable number.

How A/B test significance is calculated

Conversion rate
p = \frac{x}{n}

x is conversions and n is visitors for a variant.

Pooled conversion rate
\hat{p} = \frac{x_A + x_B}{n_A + n_B}

The pooled rate is used in the standard error for the null hypothesis.

Two-proportion z-score
z = \frac{p_B - p_A}{\sqrt{\hat{p}(1 - \hat{p})(\frac{1}{n_A} + \frac{1}{n_B})}}

Positive z-scores mean variant B converted better than variant A.

The calculator uses a two-sided two-proportion z-test for conversion counts. The p-value estimates how surprising the observed difference would be if both variants had the same true conversion rate. The confidence shown here is one minus the p-value, and the note names the variant with the higher observed conversion rate.

The result-panel range bars show an approximate 95% range around each observed conversion rate. The distribution chart uses each variant's standard error to show how much the estimates overlap. This method is appropriate for simple binary outcomes such as signup, purchase, lead, or click conversion.

This is not a substitute for revenue-per-user tests, sequential testing plans, or experiments with non-random traffic assignment.

A/B test significance calculator FAQ

What p-value is statistically significant?
A common threshold is p < 0.05, which corresponds to about 95% confidence in this calculator. Your team may choose a stricter threshold before launching the test.
Can I stop my A/B test as soon as it is significant?
That can inflate false positives if you repeatedly check results. Decide your sample size, minimum runtime, and decision rule before the test starts.
Should I use a one-sided or two-sided test?
This calculator uses a two-sided test because it checks whether either variant is different. Use a one-sided plan only when that choice is made before the experiment starts.
Why does the calculator warn about small samples?
The z-test approximation works best when each variant has enough expected conversions and non-conversions. Very small counts can need an exact test instead.