6. Inference for Categorical Data(COLON) Proportions

Setting Up A Test For A Population Proportion

Setting Up a Test for a Population Proportion

Imagine a school claims that more than half of students prefer online homework over paper homework 📚💻. How could you check whether that claim is believable using data from a sample? That is exactly what a test for a population proportion helps you do, students. In AP Statistics, this lesson is about setting up the test correctly: naming the parameter, writing hypotheses, checking conditions, and identifying the test statistic and $p$-value idea. If the setup is wrong, the conclusion will not be valid.

What a population proportion test is trying to answer

A population proportion is the true proportion of all individuals in a population with a certain characteristic. We write it as $p$. For example, $p$ could represent the proportion of all students at a school who prefer online homework.

A significance test for a population proportion uses sample data to examine a claim about $p$. The goal is not to prove a statement with certainty. Instead, the goal is to see whether the sample provides enough evidence against a specific claim called the null hypothesis. That claim often says that the proportion equals a certain value, like $p=0.50$.

Here is the big idea: if the sample result would be very unlikely when the null hypothesis is true, then we have evidence that the null hypothesis may not be correct. This is a core idea in AP Statistics inference for categorical data 🧠.

Before doing calculations, students, you must clearly identify the question being asked. Is the claim about:

  • one population proportion $p$?
  • a difference between two proportions $p_1-p_2$?
  • a confidence interval or a significance test?

This lesson focuses on the first case: one proportion.

The parameter, hypotheses, and direction of the test

The first step in setting up any hypothesis test is identifying the parameter. For a population proportion test, the parameter is:

$$p = \text{the true proportion of the population with the trait of interest}$$

Next, write the hypotheses.

The null hypothesis is usually a statement of equality:

$$H_0: p = p_0$$

Here, $p_0$ is the claimed value from the problem. It might come from a previous study, a policy goal, or a historical value.

The alternative hypothesis reflects what you are looking for evidence of:

$$H_a: p > p_0$$

$$H_a: p < p_0$$

$$H_a: p \ne p_0$$

The choice depends on the wording of the claim.

Example

Suppose a cafeteria manager claims that $60\%$ of students choose the vegetarian lunch at least once a week. You sample students and want to test that claim.

  • Parameter: $p = $ the true proportion of all students who choose the vegetarian lunch at least once a week.
  • Null hypothesis: $H_0: p = 0.60$
  • Alternative hypothesis depends on the question:
  • If you want to know whether the true proportion is different, $H_a: p \ne 0.60$
  • If you want to know whether it is less, $H_a: p < 0.60$
  • If you want to know whether it is greater, $H_a: p > 0.60$

The wording matters. A test is not just about crunching numbers; it is about matching the statistical method to the real question.

How to choose the correct hypotheses from context

A common AP Statistics skill is translating context into mathematical notation. students, this is where many students lose easy points, so slow down and connect the words to the symbols.

Look for clue words:

  • “different from” or “changed” usually means $H_a: p \ne p_0$
  • “greater than,” “more than,” or “increased” usually means $H_a: p > p_0$
  • “less than” or “decreased” usually means $H_a: p < p_0$

The null hypothesis always includes equality because the test is built around assuming no change, no difference, or no effect at first.

Real-world example

A streaming service says that $75\%$ of users watch at least one show per week. A researcher suspects the true proportion is lower.

The hypotheses are:

$$H_0: p = 0.75$$

$$H_a: p < 0.75$$

This is a left-tailed test because the alternative uses “less than.”

If the researcher instead suspected the proportion is not exactly $0.75$, then the hypotheses would be:

$$H_0: p = 0.75$$

$$H_a: p \ne 0.75$$

That would be a two-sided test.

Checking conditions before using the test

Setting up the test also means checking whether the sample design and data meet the conditions needed for inference. AP Statistics often expects you to explain these clearly.

The usual conditions for a one-proportion $z$ test are:

1. Random condition

The sample should be random or come from a randomized experiment. Randomness helps make the sample representative of the population.

2. Independence condition

Individual observations should be independent. If sampling without replacement, a common rule is that the sample size should be no more than $10\%$ of the population:

$$n \le 0.10N$$

where $n$ is the sample size and $N$ is the population size.

3. Large counts condition

For a hypothesis test about one proportion, use the null value $p_0$ to check the expected number of successes and failures:

$$np_0 \ge 10$$

$$n(1-p_0) \ge 10$$

These conditions matter because the test uses the normal model for the sample proportion, and the normal approximation works better when expected counts are large enough.

Example of checking conditions

Suppose a sample of $100$ students is taken to test whether the proportion who bring lunch from home is $0.40$.

Check large counts using $p_0=0.40$:

$$100(0.40)=40$$

$$100(1-0.40)=60$$

Both are at least $10$, so the large counts condition is satisfied.

If the sample was random and $100$ is less than $10\%$ of the student population, then the conditions are met.

The test statistic and why the null value matters

After the setup, the next step in the full test is the test statistic, usually a $z$-score. For a one-proportion significance test, it is:

$$z=\frac{\hat{p}-p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}$$

Here:

  • $\hat{p}$ is the sample proportion
  • $p_0$ is the hypothesized population proportion
  • $n$ is the sample size

Notice that the standard deviation formula uses $p_0$, not $\hat{p}$. That is because the null hypothesis assumes the population proportion equals $p_0$, and the test asks how unusual the sample result would be if that were true.

Example

Suppose $n=200$ students are surveyed, and $118$ say they prefer a certain school event. Then:

$$\hat{p}=\frac{118}{200}=0.59$$

If the school claims $p_0=0.50$, then the test statistic would be based on the difference between $0.59$ and $0.50$. The farther $\hat{p}$ is from $p_0$, the stronger the evidence may be against the null hypothesis.

But remember, students, the actual conclusion depends on the $p$-value, not just on whether the sample proportion looks different.

How the $p$-value connects to the setup

The $p$-value is the probability, assuming $H_0$ is true, of getting a sample result at least as extreme as the one observed.

This phrase is important:

  • assuming $H_0$ is true means the calculation starts from the null hypothesis
  • at least as extreme depends on whether the test is left-tailed, right-tailed, or two-sided

For example, if the alternative hypothesis is $H_a: p < 0.75$, then “at least as extreme” means sample proportions that are $0.75$ or lower, and especially those much lower than $0.75$.

A small $p$-value suggests the sample is unlikely if $H_0$ is true. That gives evidence in favor of $H_a$.

A large $p$-value means the sample result is not surprising under $H_0$, so there is not enough evidence to reject it.

Why setting up the test carefully matters in AP Statistics

In AP Statistics, students do more than find answers. They must show statistical reasoning. A well-written setup usually includes:

  • the parameter in context
  • the null and alternative hypotheses
  • the direction of the test
  • conditions for inference
  • a brief link to the appropriate procedure

This is important because the same data can be interpreted incorrectly if the setup is unclear. For example, a statement like “the sample proportion is $0.62$” is not enough. You must explain what population proportion is being tested and what the claim is.

Setting up a test for one proportion is also a foundation for later topics:

  • confidence intervals for a population proportion
  • tests for differences between two proportions
  • understanding Type I and Type II errors

If the null hypothesis is rejected when it is actually true, that is a Type I error. If the null hypothesis is not rejected when the alternative is actually true, that is a Type II error. You do not decide these errors during setup, but the setup determines what kinds of mistakes are possible and how the test will be interpreted.

Conclusion

students, setting up a test for a population proportion is the first and most important step in making a valid inference. You identify the parameter $p$, write hypotheses that match the context, choose the correct tail, and check the conditions so the one-proportion $z$ test is appropriate. Once that foundation is solid, the rest of the test—calculating the test statistic, finding the $p$-value, and making a conclusion—becomes meaningful and accurate. This skill is central to AP Statistics because it connects data, chance, and real-world decision-making 📊.

Study Notes

  • The parameter for a one-proportion test is $p$, the true population proportion.
  • The null hypothesis usually has equality: $H_0: p=p_0$.
  • The alternative hypothesis depends on the wording: $H_a: p>p_0$, $H_a: p<p_0$, or $H_a: p\ne p_0$.
  • “More than” means right-tailed, “less than” means left-tailed, and “different from” means two-sided.
  • Check conditions before using a one-proportion $z$ test:
  • random sample or random assignment
  • independence, often using the $10\%$ condition
  • large counts: $np_0\ge 10$ and $n(1-p_0)\ge 10$
  • The test statistic is $z=\frac{\hat{p}-p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}$.
  • The $p$-value is found assuming $H_0$ is true and measures how extreme the sample result is.
  • A small $p$-value gives evidence against $H_0$.
  • This lesson is the setup foundation for inference with categorical data, including confidence intervals and tests for two proportions.

Practice Quiz

5 questions to test your understanding