Carrying Out a Chi-Square Test for Goodness of Fit
students, imagine you run a school cafeteria and want to know whether the flavors of chips students choose match the company’s advertising claim. Maybe the company says sales should be about 50% barbecue, 30% sour cream, and 20% classic. You collect data from the vending machine and ask: does the real pattern match the claim, or is something different happening? 🍟 That is the kind of question a chi-square goodness-of-fit test answers.
In this lesson, you will learn how to carry out a chi-square goodness-of-fit test step by step. You will also see how this test fits into the larger AP Statistics topic of categorical inference. By the end, you should be able to:
- identify when a goodness-of-fit test is the right procedure
- calculate expected counts and the chi-square test statistic
- check conditions for inference
- interpret the $p$-value in context
- connect the procedure to AP Statistics language and reasoning
What a Chi-Square Goodness-of-Fit Test Does
A chi-square goodness-of-fit test is used when there is one categorical variable and you want to compare the observed counts to a set of claimed or expected proportions. The word “goodness-of-fit” means we are checking how well the data fit a specific distribution.
For example, suppose a bag of candy is claimed to contain 40% red, 30% blue, 20% green, and 10% yellow candies. If you count the colors in a sample, the test asks whether the sample counts are consistent with those percentages or whether the differences are too large to be explained by random chance alone.
The null hypothesis always states that the population distribution matches the claimed distribution:
$$H_0: p_1 = p_{1,0},\; p_2 = p_{2,0},\; \dots,\; p_k = p_{k,0}$$
The alternative says that the distribution is different in at least one category:
$$H_a: \text{at least one } p_i \text{ differs from the claimed value}$$
Here, $k$ is the number of categories, and $p_i$ represents the true proportion in category $i$.
A key idea: the chi-square test compares observed counts to expected counts. If the observed counts are very far from expected counts, the test statistic becomes large, which can lead to a small $p$-value.
Step 1: State the Hypotheses Clearly
students, AP Statistics expects you to write hypotheses in context. That means you should name the categories and the claimed distribution.
Suppose a company says its cereal boxes contain one of four toys in these proportions: 25%, 25%, 25%, and 25%. If you want to test that claim, your hypotheses would be:
$$H_0: p_1 = 0.25,\; p_2 = 0.25,\; p_3 = 0.25,\; p_4 = 0.25$$
$$H_a: \text{the distribution of toys is not } (0.25, 0.25, 0.25, 0.25)$$
Notice that the alternative is not about one specific category being high or low. It simply says the overall distribution is not the same as claimed.
This is different from a one-proportion test, where the focus is only on one category. In goodness-of-fit, there are several categories, and the test checks the full pattern.
Step 2: Find the Expected Counts
Expected counts are the counts we would expect to see if the null hypothesis were true. They are found by multiplying the total sample size by each claimed proportion.
The formula is:
$$E_i = n p_{i,0}$$
where $E_i$ is the expected count for category $i$, $n$ is the sample size, and $p_{i,0}$ is the hypothesized proportion.
For example, if you sample $n = 100$ candies and the claimed proportions are 40%, 30%, 20%, and 10%, then the expected counts are:
$$E_1 = 100(0.40) = 40$$
$$E_2 = 100(0.30) = 30$$
$$E_3 = 100(0.20) = 20$$
$$E_4 = 100(0.10) = 10$$
Expected counts matter because the chi-square test measures how far the observed counts are from these expected values. A small difference is usually not surprising. A large difference may be evidence against the null hypothesis.
Step 3: Check the Conditions
Before using the chi-square test, you must check the conditions. This is an important AP Statistics habit because inference should be based on appropriate data.
1. Random
The data should come from a random sample or a randomized experiment. Randomness helps make the sample representative of the population.
2. Independence
Individual observations should be independent. If sampling without replacement, a common guideline is the 10% condition: the sample size should be no more than 10% of the population.
3. Large Expected Counts
All expected counts should be at least $5$.
If any expected count is too small, the chi-square approximation may not work well. In that case, the test may not be appropriate.
These conditions are often summarized in AP Statistics with the word R.E.L.: Random, Expected counts large enough, and Independence.
Step 4: Calculate the Chi-Square Test Statistic
The chi-square test statistic compares observed counts to expected counts using this formula:
$$\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}$$
where $O_i$ is the observed count in category $i$ and $E_i$ is the expected count in category $i$.
This formula adds up the standardized squared differences across all categories. Squaring makes every difference positive, so categories where observed counts are too high and too low both contribute to the statistic.
Example: suppose a spinner is claimed to land on four colors with equal probability. You spin it $80$ times and get:
- red: $18$
- blue: $22$
- green: $20$
- yellow: $20$
If the spinner is fair, the expected count for each color is:
$$E_i = 80(0.25) = 20$$
Now compute the chi-square statistic:
$$\chi^2 = \frac{(18-20)^2}{20} + \frac{(22-20)^2}{20} + \frac{(20-20)^2}{20} + \frac{(20-20)^2}{20}$$
$$\chi^2 = \frac{4}{20} + \frac{4}{20} + 0 + 0 = 0.4$$
A value like $0.4$ is small, which suggests the observed counts are very close to what the null hypothesis predicts.
Step 5: Find the Degrees of Freedom and the $p$-Value
For a goodness-of-fit test with $k$ categories, the degrees of freedom are:
$$df = k - 1$$
If there are $4$ categories, then:
$$df = 4 - 1 = 3$$
The chi-square distribution is right-skewed, and larger values of $\chi^2$ give smaller $p$-values. The $p$-value is the probability, assuming $H_0$ is true, of getting a chi-square statistic at least as large as the one observed.
In symbols:
$$P(\chi^2 \ge \chi^2_{\text{obs}})$$
If the $p$-value is smaller than the significance level $\alpha$, you reject $H_0$. If it is larger, you fail to reject $H_0$.
For example, with $df = 3$ and $\chi^2 = 0.4$, the $p$-value would be very large, so there would be no evidence against the claimed distribution.
Step 6: Write the Conclusion in Context
AP Statistics rewards interpretation in context, not just computation. Your conclusion should connect the statistical result to the real situation.
A strong conclusion includes three parts:
- State whether you reject or fail to reject $H_0$
- Refer to the $p$-value or significance level
- Explain the result in the situation
Example conclusion:
“Because the $p$-value is greater than $0.05$, we fail to reject $H_0$. There is not convincing evidence that the distribution of toy types in the cereal boxes differs from the company’s claimed distribution.”
That wording is careful. It does not say the null hypothesis is proven true. It only says there is not enough evidence to reject it.
How This Fits Into Chi-Square Inference
Goodness-of-fit is one of three major chi-square procedures in AP Statistics:
- goodness-of-fit tests: one categorical variable, compare to a claimed distribution
- tests for homogeneity: compare distributions across several populations or groups
- tests for independence: study whether two categorical variables are associated within one population
These tests all use the chi-square statistic and the same general logic: compare observed counts to expected counts. The difference is what kind of question is being asked.
If you are given one variable and one claimed distribution, use goodness-of-fit.
If you are comparing distributions across groups, use homogeneity.
If you are checking association between two variables, use independence.
Knowing how to choose the right test is just as important as doing the calculations.
Conclusion
students, a chi-square goodness-of-fit test is a powerful way to check whether observed categorical data match a claimed distribution. The test begins with hypotheses about population proportions, uses expected counts based on the null model, and measures discrepancy with the chi-square statistic $\chi^2$. You then use the chi-square distribution with $df = k - 1$ to find a $p$-value and make a conclusion in context.
This procedure is a central part of inference for categorical data because it turns a pattern in counts into evidence. When you can recognize the right situation, check the conditions, and interpret the result clearly, you are using AP Statistics thinking the way it is meant to be used 📊
Study Notes
- A chi-square goodness-of-fit test checks whether one categorical variable follows a claimed distribution.
- Null hypothesis: the population proportions equal the claimed proportions.
- Alternative hypothesis: at least one population proportion differs.
- Expected count formula: $E_i = n p_{i,0}$.
- Test statistic: $\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}$.
- Degrees of freedom for goodness-of-fit: $df = k - 1$.
- Conditions: random data, independent observations, and all expected counts at least $5$.
- Large $\chi^2$ values suggest observed counts are far from expected counts.
- The $p$-value is the probability, assuming $H_0$ is true, of getting a chi-square statistic at least as large as the observed one.
- Reject $H_0$ if the $p$-value is less than $\alpha$; otherwise, fail to reject $H_0$.
- Goodness-of-fit is one of the three main chi-square procedures, along with homogeneity and independence tests.
- Always write conclusions in context using the wording of the problem.
