Sampling Distributions for Sample Proportions
Welcome, students! 🎯 In this lesson, you will learn how sample proportions behave when we repeatedly take random samples from a population. This idea is a core part of AP Statistics because it helps you understand why sample results are not always exactly the same as the true population value, and how to judge whether a sample proportion is trustworthy.
Objectives:
- Explain the meaning of a sampling distribution for a sample proportion.
- Identify the center, spread, and shape of the sampling distribution of $\hat{p}$.
- Use the conditions needed to apply the normal model.
- Connect sample size, variability, and the Central Limit Theorem to real situations.
- Interpret results from sample proportions in AP Statistics problems.
Imagine a school wants to know what fraction of students support a new lunch menu 🍎. If you ask one random sample of students, the percent who support it will not exactly match the whole school. If you asked many random samples of the same size, the sample proportions would vary. The pattern formed by those repeated sample proportions is the sampling distribution.
What a Sample Proportion Really Means
A sample proportion is written as $\hat{p}$ and is calculated by
$$\hat{p}=\frac{x}{n}$$
where $x$ is the number of “successes” in the sample and $n$ is the sample size.
In AP Statistics, a “success” does not mean something good or bad. It simply means the outcome of interest. For example:
- a student supports the new lunch menu,
- a voter prefers Candidate A,
- a customer buys a product,
- a light bulb is defective.
If 42 out of 100 students support the lunch menu, then
$$\hat{p}=\frac{42}{100}=0.42$$
This is a point estimate for the true population proportion $p$. A point estimate is a single value used to estimate an unknown population parameter. Here, $\hat{p}$ estimates $p$.
It is important to remember that $\hat{p}$ changes from sample to sample. That change is not a mistake; it is a natural result of random sampling.
What Is a Sampling Distribution?
A sampling distribution is the distribution of a statistic from many repeated random samples of the same size from the same population.
For sample proportions, the sampling distribution describes how $\hat{p}$ behaves across all possible random samples of size $n$.
Think of it like this: if a teacher repeatedly took random samples of 50 students from the same school and recorded the proportion who preferred online homework, the values of $\hat{p}$ would bounce around a little. Some samples might give $0.48$, others $0.52$, others $0.40$. The collection of those values makes the sampling distribution 📊.
This distribution matters because it tells us how much sampling variation to expect. It helps us decide whether a sample result is ordinary or unusually far from what we expect.
Center, Spread, and Shape of $\hat{p}$
The sampling distribution of $\hat{p}$ has three main features: center, spread, and shape.
Center
The mean of the sampling distribution of $\hat{p}$ is the population proportion:
$$\mu_{\hat{p}}=p$$
This means $\hat{p}$ is an unbiased estimator of $p$. On average, over many random samples, the sample proportion centers at the true proportion.
Spread
The standard deviation of the sampling distribution of $\hat{p}$ is
$$\sigma_{\hat{p}}=\sqrt{\frac{p(1-p)}{n}}$$
This formula shows two important ideas:
- larger samples make the spread smaller,
- proportions near $0.50$ have more variability than proportions near $0$ or $1$.
For example, if $p=0.60$ and $n=100$,
$$\sigma_{\hat{p}}=\sqrt{\frac{0.60(0.40)}{100}}=\sqrt{0.0024}\approx 0.049$$
So sample proportions usually vary by about $0.05$ from sample to sample. That means a sample proportion like $0.55$ or $0.65$ would not be surprising.
Shape
The shape depends on sample size and the success-failure condition. If the sample size is large enough, the sampling distribution of $\hat{p}$ is approximately normal. This is a version of the Central Limit Theorem for proportions.
The condition is:
$$np\ge 10 \quad \text{and} \quad n(1-p)\ge 10$$
If both are true, the sampling distribution of $\hat{p}$ is approximately normal. If not, the distribution may be skewed, especially when $p$ is close to $0$ or $1$.
For example, if a rare disease affects only $2\%$ of people, then $p=0.02$. A sample of size $50$ gives
$$np=50(0.02)=1$$
which is too small for the normal approximation. But a sample of size $1000$ gives
$$np=1000(0.02)=20$$
and
$$n(1-p)=1000(0.98)=980$$
so the normal model is much more reasonable.
Why Sample Size Matters
Sample size has a big impact on how much sample proportions vary. As $n$ increases, the standard deviation
$$\sigma_{\hat{p}}=\sqrt{\frac{p(1-p)}{n}}$$
gets smaller because $n$ is in the denominator. This means larger samples produce more stable estimates.
Real-world example: suppose a streaming service wants to estimate the proportion of users who like a new show. A sample of $20$ users may give a very noisy result. A sample of $500$ users is likely to be much more reliable because random chance has less effect.
This is why large surveys are often more precise than small ones, assuming the samples are random and well designed. Bigger sample size does not automatically remove bias, though. If the sampling method is flawed, a large sample can still give a misleading result.
Using the Normal Model for Probabilities
When the conditions are met, students, you can treat $\hat{p}$ as approximately normal with mean $p$ and standard deviation $\sqrt{\frac{p(1-p)}{n}}$.
Then you can calculate z-scores using
$$z=\frac{\hat{p}-p}{\sqrt{\frac{p(1-p)}{n}}}$$
This lets you find probabilities or judge how unusual a sample proportion is.
Example
Suppose the true proportion of students who walk to school is $p=0.30$, and a random sample of $n=200$ students is taken. Find the probability that the sample proportion is at least $0.36$.
First, check conditions:
$$np=200(0.30)=60$$
$$n(1-p)=200(0.70)=140$$
Both are at least $10$, so the normal model works.
Now find the standard deviation:
$$\sigma_{\hat{p}}=\sqrt{\frac{0.30(0.70)}{200}}\approx 0.0324$$
Compute the z-score for $\hat{p}=0.36$:
$$z=\frac{0.36-0.30}{0.0324}\approx 1.85$$
A z-score of about $1.85$ is fairly unusual, so the probability of getting a sample proportion at least that large is relatively small.
This kind of reasoning appears often on AP Statistics exams. You are not just calculating; you are interpreting whether a result is surprising given the sampling distribution.
Sampling Distributions and AP Statistics Thinking
Sampling distributions connect directly to the bigger AP Statistics idea that sample data are used to make inferences about populations. The sample proportion $\hat{p}$ is a statistic, and statistics vary from sample to sample.
This variation is why AP Statistics emphasizes random sampling. Random sampling makes it possible to describe what would happen across many possible samples. If the sample is random, the sampling distribution helps us understand the likely behavior of the statistic.
Here are some common AP Statistics ideas connected to $\hat{p}$:
- Population proportion: $p$
- Sample proportion: $\hat{p}$
- Point estimate: $\hat{p}$ estimates $p$
- Bias: whether the estimator centers at the true value
- Variability: how much $\hat{p}$ changes from sample to sample
- Normal approximation: used when $np\ge 10$ and $n(1-p)\ge 10$
A strong understanding of sampling distributions helps with confidence intervals and hypothesis tests later in the course because both rely on the idea that sample statistics have predictable long-run behavior.
Conclusion
Sampling distributions for sample proportions explain how $\hat{p}$ behaves over many random samples. The key facts are simple but powerful: the center is $p$, the spread is $\sqrt{\frac{p(1-p)}{n}}$, and the distribution is approximately normal when $np\ge 10$ and $n(1-p)\ge 10$. Larger samples create less variability, which makes sample proportions more reliable. This topic is a foundation for the rest of inference in AP Statistics because it shows how to measure uncertainty in sample results. ✅
Study Notes
- A sample proportion is $\hat{p}=\frac{x}{n}$, where $x$ is the number of successes and $n$ is the sample size.
- The sampling distribution of $\hat{p}$ is the distribution of $\hat{p}$ from many random samples of the same size.
- The mean of the sampling distribution is $\mu_{\hat{p}}=p$.
- The standard deviation of the sampling distribution is $\sigma_{\hat{p}}=\sqrt{\frac{p(1-p)}{n}}$.
- The normal approximation is valid when $np\ge 10$ and $n(1-p)\ge 10$.
- Larger sample sizes make the sampling distribution less spread out.
- $\hat{p}$ is a point estimate of the population proportion $p$.
- Sampling distributions are essential for understanding confidence intervals and hypothesis tests.
- Random sampling is necessary for the sampling distribution to be meaningful.
- In AP Statistics, always interpret results in context using the population, sample, and the question being asked.
