Setting Up a Chi-Square Goodness of Fit Test 📊

Introduction: What are we checking, students?

In AP Statistics, a chi-square goodness-of-fit test helps us check whether one categorical variable follows a claimed distribution. In simple terms, we compare what we observe to what we expect if a claim is true. This lesson focuses on the setup: deciding when to use the test, naming the hypotheses, finding expected counts, and writing the correct conclusion plan. 😊

Learning objectives:

Explain the main ideas and terms used in a chi-square goodness-of-fit test.
Set up the correct hypotheses for a categorical distribution claim.
Find expected counts and check whether the test conditions are met.
Connect this test to the larger unit on categorical inference.
Use real-world data situations to choose and justify this procedure.

A common example is a candy company claiming its package colors follow a certain proportion, such as $20\%$ red, $20\%$ blue, $20\%$ green, $20\%$ yellow, and $20\%$ orange. If a sample of candies does not match those proportions closely, we can use a chi-square goodness-of-fit test to see whether the difference is likely due to random chance or evidence against the claim.

What a goodness-of-fit test does

The word goodness-of-fit means “how well the data fit a claimed distribution.” The claim is usually a list of category proportions or percentages. For example, a teacher may claim that survey responses should be evenly split among four choices, or a game spinner should land on each section with a certain probability.

This test is used when there is one categorical variable with more than two categories. That is a key idea, students. We are not comparing two groups here. We are checking whether one set of observed counts matches a set of expected counts.

The test statistic is based on the formula

$$\chi^2 = \sum \frac{(O - E)^2}{E}$$

where $O$ is the observed count and $E$ is the expected count for each category. A large value of $\chi^2$ means the observed counts are far from the expected counts.

But before calculating the test statistic, we must set up the test correctly. Good setup includes: identifying the variable, stating the parameter, writing hypotheses, finding expected counts, and checking conditions. If the setup is wrong, the rest of the test will be unreliable.

Identifying the situation and the parameter

The first step is to describe the categorical variable and the claim. Ask: what are the categories, and what distribution is being tested?

For example, suppose a school wants to know whether students prefer one of five lunch options in equal proportions. The variable is lunch choice, and the categories are the five menu items.

The parameter for a goodness-of-fit test is not a mean or a difference in proportions. Instead, it is the true distribution of category proportions in the population. In AP Statistics, we usually write the hypotheses using the claimed proportions.

If the claim is that the proportions are $p_1, p_2, \dots, p_k$, then the null hypothesis states that the population follows those proportions. The alternative says that at least one proportion is different.

A general setup looks like this:

$$H_0: p_1 = \text{claimed value}_1,\ p_2 = \text{claimed value}_2,\ \dots,\ p_k = \text{claimed value}_k$$

$$H_a: \text{At least one of the category proportions differs from the claim}$$

This wording is important. The alternative does not say all proportions are different. It only takes one category being off for the distribution to fail the fit.

Writing correct hypotheses

Let’s build the hypotheses carefully, students.

Suppose a die is claimed to be fair, so each face should appear with probability $\frac{1}{6}$. The hypotheses are:

$$H_0: p_1 = p_2 = p_3 = p_4 = p_5 = p_6 = \frac{1}{6}$$

$$H_a: \text{At least one } p_i \text{ is different from } \frac{1}{6}$$

Notice that the null hypothesis includes all categories and their claimed proportions. If the categories are not equally likely, use the given probabilities exactly as stated.

For example, if a cereal company says its box has $30\%$ oat pieces, $25\%$ corn pieces, $20\%$ wheat pieces, and $25\%$ rice pieces, then the hypotheses are:

$$H_0: p_1 = 0.30,\ p_2 = 0.25,\ p_3 = 0.20,\ p_4 = 0.25$$

$$H_a: \text{At least one of the proportions is different from the claim}$$

The hypotheses should always match the context. Do not write hypotheses about means, medians, or two populations. This is a categorical distribution test, so the parameter is a set of category proportions.

Finding expected counts

Expected counts are what we would expect to see in each category if the null hypothesis were true. They are found by multiplying the total sample size by each claimed proportion.

The formula is

$$E_i = n p_i$$

where $E_i$ is the expected count for category $i$, $n$ is the sample size, and $p_i$ is the claimed proportion for that category.

For example, suppose a sample of $100$ students is asked to choose a favorite school club among four options, and the claim is that each option is equally popular. Then each expected count is

$$E = 100 \left( \frac{1}{4} \right) = 25$$

for each category.

If the claim has unequal proportions, use each one separately. For instance, if a spinner is supposed to land on three colors with probabilities $0.50$, $0.30$, and $0.20$, and the sample size is $200$, then the expected counts are

$$E_1 = 200(0.50) = 100$$

$$E_2 = 200(0.30) = 60$$

$$E_3 = 200(0.20) = 40$$

Expected counts are crucial because they are used both in the test statistic and in checking conditions.

Checking the conditions for the test

Before using a chi-square goodness-of-fit test, AP Statistics expects you to verify conditions. These conditions help make sure the chi-square model is appropriate.

1. Random

The data should come from a random sample or a randomized experiment. Randomness helps make the results generalizable and supports valid inference.

2. Independent

Observations should be independent. If sampling without replacement from a population, the sample size should be no more than $10\%$ of the population. This is the $10\%$ condition.

3. Large expected counts

Each expected count should be at least $5$.

$$E_i \ge 5 \text{ for all categories}$$

If expected counts are too small, the chi-square approximation may not be accurate. In that case, the test may not be appropriate.

These conditions are often written in a full sentence response. For example:

“Because the data were collected using a random sample, the observations are independent, and all expected counts are at least $5$, the conditions for a chi-square goodness-of-fit test are met.”

That kind of statement is exactly what AP Statistics wants to see.

A full setup example

Imagine a snack company claims that a bag of trail mix contains four ingredients in the following proportions: peanuts $40\%$, raisins $30\%$, pretzels $20\%$, and chocolate pieces $10\%$. A student buys a bag with $200$ pieces and counts the ingredients.

First, identify the variable: ingredient type, which is categorical with four categories.

Next, write the hypotheses:

$$H_0: p_1 = 0.40,\ p_2 = 0.30,\ p_3 = 0.20,\ p_4 = 0.10$$

$$H_a: \text{At least one of the true proportions differs from the claim}$$

Then compute expected counts:

$$E_1 = 200(0.40) = 80$$

$$E_2 = 200(0.30) = 60$$

$$E_3 = 200(0.20) = 40$$

$$E_4 = 200(0.10) = 20$$

Now check the conditions. If the student selected the bag randomly from a large production lot, the sample is random. If the bag represents a tiny part of the total production, independence is reasonable. Since all expected counts are at least $5$, the test conditions are satisfied.

This is the full setup stage. Only after this would you calculate the test statistic, degrees of freedom, and $p$-value.

How this fits into the chi-square unit

Setting up a goodness-of-fit test is one part of the larger chi-square topic in AP Statistics. The unit also includes:

Chi-square tests for homogeneity, which compare distributions across multiple groups.
Chi-square tests for independence, which examine whether two categorical variables are related.
Expected counts, which appear in all chi-square procedures.
Selecting the correct test based on the research question.

The goodness-of-fit test is the one to use when there is one categorical variable and a claim about its distribution. If there are two categorical variables, think about independence. If there are multiple groups and one categorical variable, think about homogeneity.

Knowing the difference saves time and prevents errors on the AP exam. The key question is: what is being compared?

Conclusion

students, setting up a chi-square goodness-of-fit test means more than just plugging numbers into a formula. It means understanding the claim, defining the categorical variable, writing correct hypotheses, calculating expected counts, and checking conditions. Once the setup is correct, the rest of the test becomes much easier to complete accurately.

This procedure is a foundation for inference about categorical data because it connects real-world counts to probability models. When observed counts differ a lot from expected counts, the data may provide evidence that the claimed distribution is not true. That is the main idea behind the chi-square goodness-of-fit test. 📘

Study Notes

A chi-square goodness-of-fit test is used for one categorical variable with more than two categories.
The null hypothesis says the population follows the claimed distribution of proportions.
The alternative hypothesis says at least one category proportion differs from the claim.
Expected counts are found with $E_i = n p_i$.
The chi-square test statistic is $\chi^2 = \sum \frac{(O - E)^2}{E}$.
Conditions: random, independent, and all expected counts at least $5$.
Use this test when checking whether observed counts fit a claimed distribution.
If there are two categorical variables, consider a chi-square test of independence instead.
If there are multiple groups and one categorical variable, consider a chi-square test of homogeneity instead.
Good setup is essential before calculating a $p$-value or making a conclusion.