Confidence Intervals 📊
Welcome, students. In statistics, we often want to learn about a whole population, but measuring everyone is usually impossible. Instead, we collect a sample and use it to estimate an unknown population value. A confidence interval is one of the most important tools for doing this. It gives a range of reasonable values for a population parameter, based on sample data.
In this lesson, you will learn how confidence intervals work, what the main terms mean, how they connect to probability and inference, and how they help people make real-world decisions. By the end, you should be able to explain why a confidence interval is more informative than a single estimate and interpret it correctly in context.
What a confidence interval means
A confidence interval is an interval estimate for a population parameter such as a mean $\mu$ or a proportion $p$. Instead of saying “the population mean is exactly this number,” we say “the population mean is likely to be within this range.” This is useful because sample data naturally vary from sample to sample.
For example, suppose a school wants to estimate the average number of hours students sleep on a school night. If a sample of students gives a sample mean of $7.2$ hours, that number is only an estimate. Another sample might give $7.0$ hours or $7.4$ hours. A confidence interval gives a range around the estimate that reflects this uncertainty.
The most common form is:
$$\text{estimate} \pm \text{margin of error}$$
For a population mean, a confidence interval often looks like
$$\bar{x} \pm z^* \frac{\sigma}{\sqrt{n}}$$
when the population standard deviation $\sigma$ is known, or an approximate version using $s$ when $\sigma$ is unknown and the sample is large. Here, $\bar{x}$ is the sample mean, $n$ is the sample size, and $z^*$ is the critical value from the standard normal distribution.
For a population proportion, the interval often looks like
$$\hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$
where $\hat{p}$ is the sample proportion.
The key idea is simple: more data usually gives a narrower interval, and more certainty usually gives a wider interval. 📈
Important terminology and ideas
To use confidence intervals well, students, you need to know the language around them.
A population parameter is a number that describes a whole population, such as $\mu$ or $p$. A sample statistic is a number from the sample, such as $\bar{x}$ or $\hat{p}$. We use the statistic to estimate the parameter.
The confidence level tells us how much confidence we want in the method. Common levels are $90\%$, $95\%$, and $99\%$. A $95\%$ confidence interval does not mean there is a $95\%$ chance that the true parameter is inside this one interval. Instead, it means that if we repeated the sampling process many times and built intervals the same way, about $95\%$ of those intervals would contain the true parameter.
This is a very important distinction. The parameter is fixed; the interval changes from sample to sample.
The margin of error is the amount added and subtracted from the estimate. It depends on the confidence level and the variability in the data. Larger confidence levels give larger margins of error.
The critical value $z^*$ is the value taken from a probability distribution that matches the chosen confidence level. For a $95\%$ confidence interval, the middle $95\%$ of the standard normal curve is used, leaving $2.5\%$ in each tail.
The standard error measures how much a statistic typically varies from sample to sample. For the sample mean, it is often
$$\frac{\sigma}{\sqrt{n}}$$
or estimated by
$$\frac{s}{\sqrt{n}}$$
when the population standard deviation is unknown.
Confidence intervals for a population mean
A confidence interval for a population mean is used when the variable is quantitative, such as height, time, mass, or test scores. Suppose students, that a researcher wants to estimate the average commute time for students in a city.
Imagine a random sample of $64$ students gives a sample mean of $18.5$ minutes and a sample standard deviation of $6.4$ minutes. If we use a $95\%$ confidence interval and a large-sample approximation, we first find the standard error:
$$\frac{s}{\sqrt{n}}=\frac{6.4}{\sqrt{64}}=0.8$$
Using $z^*\approx 1.96$ for $95\%$ confidence, the margin of error is
$$1.96(0.8)=1.568$$
So the confidence interval is
$$18.5\pm 1.568$$
which gives approximately
$$\left(16.932,\,20.068\right)$$
In context, we would say: we are $95\%$ confident that the true mean commute time for all students in the city is between about $16.9$ and $20.1$ minutes.
This is useful because it gives both an estimate and a sense of uncertainty. If the interval is too wide, the estimate may not be very precise. If it is narrower, the estimate is more precise.
Confidence intervals for a population proportion
A confidence interval for a proportion is used when the data are categorical, such as yes/no, pass/fail, or support/opposition. For example, a polling company may want to estimate the proportion of voters who support a policy.
Suppose a random sample of $400$ voters shows that $228$ support the policy. Then
$$\hat{p}=\frac{228}{400}=0.57$$
For a $95\%$ confidence interval, the standard error is
$$\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}=\sqrt{\frac{0.57(0.43)}{400}}$$
This is approximately $0.0248$. The margin of error is
$$1.96(0.0248)\approx 0.049$$
So the confidence interval is
$$0.57\pm 0.049$$
which gives approximately
$$\left(0.521,\,0.619\right)$$
This means we are $95\%$ confident that the true proportion of all voters who support the policy is between about $52.1\%$ and $61.9\%$.
This type of interval is especially useful in surveys and elections. 📋
How confidence intervals connect to probability and decision-making
Confidence intervals are part of inferential statistics, which means using sample data to make conclusions about a population. They connect directly to probability because the method for building the interval relies on a probability distribution.
In IB Mathematics: Applications and Interpretation HL, you are expected to interpret statistical results in context and make decisions based on evidence. A confidence interval can help answer questions such as:
- Is a new product’s average rating above a target value?
- Does a school’s average exam score seem to differ from a national average?
- Is a proportion large enough to justify a business change?
Suppose a company claims its new battery lasts $10$ hours on average. If a $95\%$ confidence interval for the mean battery life is $(9.4, 10.2)$, the claim seems plausible. But if the interval is $(8.7, 9.5)$, the claim is not supported by the data.
Confidence intervals are often paired with hypothesis tests. A two-sided test at significance level $\alpha=0.05$ is closely related to a $95\%$ confidence interval. If the null value is outside the interval, that provides evidence against the null hypothesis. This connection helps students see that intervals are not just calculation tools; they are decision tools too. ✅
How to interpret a confidence interval correctly
Correct interpretation is essential, students. A good interpretation must mention the context, the confidence level, and the population parameter.
A correct interpretation might be:
“We are $95\%$ confident that the true mean number of hours of sleep for all students at the school is between $7.0$ and $7.4$ hours.”
A common mistake is saying:
“There is a $95\%$ chance that the true mean is in the interval.”
This is not the correct frequentist interpretation. The interval either contains the true parameter or it does not; probability describes the long-run behavior of the method, not the chance for one fixed parameter.
Another mistake is forgetting the population. A confidence interval from a sample describes the population, not just the sample itself.
When interpreting a confidence interval, always include:
- the parameter being estimated,
- the confidence level,
- the interval values,
- the real-world context.
Why sample size and confidence level matter
Two main factors affect the width of a confidence interval: sample size and confidence level.
If the sample size $n$ increases, the standard error usually decreases because of the $\sqrt{n}$ in the denominator. This makes the interval narrower. Larger samples give more precise estimates.
If the confidence level increases from $95\%$ to $99\%$, the critical value $z^*$ increases. This makes the interval wider because we want more confidence that the interval includes the true parameter.
This creates a trade-off:
- higher confidence level means wider interval,
- lower confidence level means narrower interval.
A wide interval is safer but less precise. A narrow interval is more precise but less certain. Real-world statisticians must choose a balance depending on the decision being made.
Conclusion
Confidence intervals are one of the most useful ideas in statistics because they turn sample data into meaningful conclusions about a population. They help describe uncertainty in a clear and practical way. In IB Mathematics: Applications and Interpretation HL, you should understand what a confidence interval means, how to calculate one for a mean or proportion, and how to interpret it in context.
When you see a confidence interval, remember that it is not a guaranteed answer. It is a carefully built estimate based on probability and sampling variation. Used correctly, it helps people make informed decisions in science, health, business, education, and public policy. 🎯
Study Notes
- A confidence interval gives a plausible range for a population parameter such as $\mu$ or $p$.
- The general idea is $\text{estimate} \pm \text{margin of error}$.
- For a mean, a common form is $\bar{x} \pm z^*\frac{\sigma}{\sqrt{n}}$ or an approximation using $s$.
- For a proportion, a common form is $\hat{p} \pm z^*\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$.
- The confidence level, such as $95\%$, describes the long-run success rate of the method.
- A larger sample size usually makes the interval narrower.
- A higher confidence level makes the interval wider.
- The parameter is fixed; the interval varies from sample to sample.
- Correct interpretation must mention the population, the parameter, the confidence level, and the context.
- Confidence intervals are an important part of inferential reasoning and real-world decision-making.
