5. Sampling Distributions

Sampling Distributions For Differences In Sample Proportions

Sampling Distributions for Differences in Sample Proportions

students, imagine two schools are choosing between two lunch menus πŸ•πŸ₯—. You survey a sample from each school and compare the proportion who prefer pizza. The difference between those sample proportions can help you decide whether the schools really differ or whether the gap is just due to random chance. In this lesson, you will learn how AP Statistics describes the sampling distribution of the difference in sample proportions and how to use it in real situations.

What you will learn

By the end of this lesson, you should be able to:

  • explain the meaning of a sampling distribution for $\hat{p}_1-\hat{p}_2$
  • describe its center, shape, and spread
  • use conditions that make the normal model appropriate
  • apply formulas for comparing two proportions
  • interpret results in context with AP Statistics reasoning

This topic fits into the larger idea of sampling distributions: when we repeatedly take random samples, statistics like $\hat{p}$ and $\hat{p}_1-\hat{p}_2$ vary from sample to sample. That variation is not a flawβ€”it is the key to estimating how reliable our sample results are.

What is a difference in sample proportions?

Suppose group 1 has sample proportion $\hat{p}_1$ and group 2 has sample proportion $\hat{p}_2$. The statistic of interest is

$$\hat{p}_1-\hat{p}_2$$

This statistic measures how far apart the two sample proportions are. It is commonly used when comparing two populations, such as:

  • the proportion of students at two schools who prefer online classes
  • the proportion of voters in two regions supporting a candidate
  • the proportion of customers in two stores who prefer a brand

A sample proportion is a point estimate of a population proportion. So $\hat{p}_1-\hat{p}_2$ is a point estimate of the true difference in population proportions:

$$p_1-p_2$$

Here, $p_1$ and $p_2$ are the true population proportions. In AP Statistics, we often use the sample difference to estimate and test claims about the population difference.

The sampling distribution of $\hat{p}_1-\hat{p}_2$

A sampling distribution describes how a statistic behaves over many random samples of the same size from the same populations. For the difference in sample proportions, imagine repeatedly taking random samples from two populations and computing $\hat{p}_1-\hat{p}_2$ each time. Those values would form a distribution.

This distribution has three important features:

1. Center

If the samples are random and independent, the mean of the sampling distribution is

$$\mu_{\hat{p}_1-\hat{p}_2}=p_1-p_2$$

That means the statistic is an unbiased estimator of the true difference in proportions.

2. Spread

The standard deviation of the sampling distribution is

$$\sigma_{\hat{p}_1-\hat{p}_2}=\sqrt{\frac{p_1(1-p_1)}{n_1}+\frac{p_2(1-p_2)}{n_2}}$$

where $n_1$ and $n_2$ are the sample sizes from the two groups.

This formula shows an important idea: larger sample sizes make the statistic less variable. In other words, bigger samples give more stable estimates βœ…

3. Shape

The sampling distribution is approximately normal when the sample sizes are large enough for both groups. AP Statistics usually checks this with the success-failure condition:

$$n_1p_1\ge 10, \quad n_1(1-p_1)\ge 10, \quad n_2p_2\ge 10, \quad n_2(1-p_2)\ge 10$$

If this condition holds and the samples are random and independent, the normal model works well.

Why sample size matters

Sample size affects the spread of the sampling distribution. Since the standard deviation includes $n_1$ and $n_2$ in the denominator, larger samples reduce variability.

Think about flipping coins. If you flip a coin only 10 times, the sample proportion of heads could be very far from $0.5$. If you flip it 1000 times, the proportion usually stays much closer to $0.5$. The same idea applies to comparing proportions from two groups.

In practice, larger samples help us make better decisions because the observed difference is less likely to be caused by random sample fluctuation alone.

Conditions for using the normal model

Before using formulas or probability statements about $\hat{p}_1-\hat{p}_2$, AP Statistics expects you to check conditions.

Random

Each sample should be random, such as a simple random sample or a randomized experiment.

Independent

The two samples must be independent of each other. Also, if sampling without replacement from a population, the sample size should be less than $10\%$ of the population size:

$$n_1\le 0.1N_1 \quad \text{and} \quad n_2\le 0.1N_2$$

This is the 10% condition.

Large counts

Use the success-failure condition shown earlier. If the condition is not met, the normal approximation may not be reliable.

students, this matters because AP questions often ask you to justify whether a normal model is appropriate. You should always connect your answer to the situation, not just write formulas.

How to standardize a difference in sample proportions

When we want to compare an observed difference to what we would expect from a sampling distribution, we use a z-score.

For a known population difference $p_1-p_2$, the standardized value is

$$z=\frac{(\hat{p}_1-\hat{p}_2)-(p_1-p_2)}{\sqrt{\frac{p_1(1-p_1)}{n_1}+\frac{p_2(1-p_2)}{n_2}}}$$

This tells us how many standard deviations the observed difference is from the mean of the sampling distribution.

In hypothesis testing, the null hypothesis often says the population proportions are equal:

$$H_0: p_1-p_2=0$$

Then the formula becomes

$$z=\frac{(\hat{p}_1-\hat{p}_2)-0}{\sqrt{\frac{p_1(1-p_1)}{n_1}+\frac{p_2(1-p_2)}{n_2}}}$$

When doing a test, AP Statistics often uses a pooled estimate under the null hypothesis because the null says the proportions are equal. That pooled proportion is

$$\hat{p}=\frac{x_1+x_2}{n_1+n_2}$$

where $x_1$ and $x_2$ are the numbers of successes in each sample. Then the standard error for the hypothesis test is

$$SE=\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}$$

Example: school survey about preferred clubs

Suppose School A surveys $n_1=100$ students and finds $\hat{p}_1=0.62$ prefer the art club. School B surveys $n_2=120$ students and finds $\hat{p}_2=0.50$ prefer the art club.

The observed difference is

$$\hat{p}_1-\hat{p}_2=0.62-0.50=0.12$$

This means the sample from School A has a $12\%$ higher proportion of students who prefer the art club.

If we were creating a confidence interval for $p_1-p_2$, we would use the standard error based on the sample proportions:

$$SE=\sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1}+\frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}$$

This is because confidence intervals estimate the actual difference without assuming the null hypothesis is true.

If we were testing $H_0:p_1-p_2=0$, we would use the pooled estimate instead. That difference between confidence intervals and significance tests is a very common AP Statistics idea.

Interpreting results in context

A math answer is not complete unless it is written in context. If a confidence interval for $p_1-p_2$ is $(0.04,0.20)$, you could say:

We are $95\%$ confident that the true proportion of students at School A who prefer the art club is between $4\%$ and $20\%$ higher than the proportion at School B.

Notice how the interpretation includes:

  • the parameter $p_1-p_2$
  • the direction of the difference
  • the context of the groups
  • the confidence level

If a hypothesis test gives a very small $p$-value, you might say the sample difference would be unlikely if there were no true difference between the populations. But you should still explain what that means in the real-world context.

How this fits into the bigger AP Statistics picture

Sampling distributions for differences in sample proportions connect several major AP Statistics ideas:

  • sampling distributions explain why statistics vary from sample to sample
  • point estimates use statistics like $\hat{p}_1-\hat{p}_2$ to estimate parameters like $p_1-p_2$
  • the Central Limit Theorem helps justify normal models for large samples
  • inference uses sample data to draw conclusions about populations

This lesson is part of the broader goal of understanding how sample size, randomness, and variability affect conclusions. The central message is simple: a sample difference by itself is not enough. You must ask whether it is bigger than what random sampling would usually produce.

Conclusion

Sampling distributions for differences in sample proportions help us compare two groups in a structured, statistical way. The statistic $\hat{p}_1-\hat{p}_2$ estimates $p_1-p_2$, and its sampling distribution tells us what differences are expected just from random sampling. When conditions are met, the distribution is approximately normal, centered at $p_1-p_2$, with spread given by a standard deviation formula that depends on both sample sizes. students, mastering this lesson will help you understand confidence intervals, hypothesis tests, and many AP Statistics comparison questions πŸ“Š

Study Notes

  • $\hat{p}_1-\hat{p}_2$ is the statistic used to compare two sample proportions.
  • It estimates the true difference in population proportions $p_1-p_2$.
  • The sampling distribution is centered at $p_1-p_2$.
  • The standard deviation is $\sqrt{\frac{p_1(1-p_1)}{n_1}+\frac{p_2(1-p_2)}{n_2}}$.
  • Larger sample sizes reduce variability in $\hat{p}_1-\hat{p}_2$.
  • Use the 10% condition for independence when sampling without replacement.
  • Use the success-failure condition to check whether a normal model is reasonable.
  • For tests about $p_1-p_2$, AP Statistics often uses a pooled proportion under $H_0$.
  • For confidence intervals, use the sample proportions in the standard error.
  • Always interpret results in context and connect them to the populations being studied.

Practice Quiz

5 questions to test your understanding