8. Inference for Categorical Data(COLON) Chi-Square

Setting Up A Chi-square Test For Homogeneity Or Independence

Setting Up a Chi-Square Test for Homogeneity or Independence

students, imagine you are comparing several groups and asking a simple question: do they seem to follow the same pattern, or is there a relationship between two categorical variables? That is the big idea behind chi-square inference for categorical data 📊. In this lesson, you will learn how to set up a chi-square test for homogeneity or a chi-square test for independence, how to tell which test to use, and how to describe the hypotheses correctly.

What these chi-square tests are trying to answer

A chi-square test uses counts in categories, not measurements like height or temperature. The data must be categorical, such as favorite pizza topping, type of music, or whether a student passes or fails. The test compares observed counts to expected counts. If the observed counts are very different from what we would expect under a certain claim, that gives evidence against the claim.

There are two closely related chi-square tests in AP Statistics:

  • Chi-square test for homogeneity: compares the distribution of a categorical variable across two or more populations or groups.
  • Chi-square test for independence: checks whether two categorical variables are associated within one population.

These tests use the same calculation, but the wording of the hypotheses and the study design are different. That is why setting up the test correctly matters so much.

For both tests, the null hypothesis says there is no difference or no relationship. The alternative says there is a difference or relationship. In symbols, the test statistic is based on the idea

$$\chi^2=\sum \frac{(O-E)^2}{E}$$

where $O$ stands for observed count and $E$ stands for expected count.

How to tell whether it is a homogeneity test or an independence test

students, the fastest way to choose the correct test is to look at the question being asked and the study design.

Use a test for homogeneity when:

  • You have one categorical variable measured across multiple groups.
  • You want to compare the distribution of that variable among the groups.
  • The groups come from different populations or treatments.

Example: A researcher compares the favorite drink choices of students from three different grade levels: freshmen, sophomores, and juniors. The question is whether the distribution of drink preferences is the same across the three grades. That is a homogeneity test.

Use a test for independence when:

  • You have two categorical variables measured on the same group of individuals.
  • You want to know whether the variables are associated.
  • The data come from one random sample.

Example: A school surveys students and records both lunch preference and club membership. The question is whether lunch preference and club membership are related. That is an independence test.

A useful memory tip is this:

  • Homogeneity compares groups.
  • Independence checks for a relationship between variables in one group.

Even though the formulas are the same, the setup changes depending on which situation you are in ✅.

Writing the hypotheses the AP Statistics way

When you set up a chi-square test, your hypotheses must be written in words, not in terms of means or proportions for a single category. Since this test is about distributions, the null and alternative hypotheses talk about whether the distributions are the same or whether the variables are independent.

For a chi-square test for homogeneity:

  • Null hypothesis: The distribution of the categorical variable is the same for all populations or groups.
  • Alternative hypothesis: At least one population or group has a different distribution.

Example wording:

$$H_0:$$

The distribution of preferred study method is the same for students in each grade level.

$$H_a:$$

The distribution of preferred study method is not the same for all grade levels.

For a chi-square test for independence:

  • Null hypothesis: The two categorical variables are independent.
  • Alternative hypothesis: The two categorical variables are associated.

Example wording:

$$H_0:$$

Music preference and sport participation are independent.

$$H_a:$$

Music preference and sport participation are associated.

Notice that the alternative for independence does not say “dependent” on the AP exam. The preferred wording is associated. That is the clearest and most standard language.

Building the table and finding expected counts

Chi-square tests are organized in a two-way table. The rows and columns show category counts. Before calculating the test statistic, you need the expected count for each cell. Expected counts are the counts we would expect if the null hypothesis were true.

For a cell in a two-way table, the expected count is found by

$$E=\frac{(\text{row total})(\text{column total})}{\text{grand total}}$$

This formula works for both homogeneity and independence tests.

Example of setup

Suppose students wants to compare opinions about a new school policy across three grade levels: 9th, 10th, and 11th grade. Students are classified as Support, Neutral, or Oppose.

The table has:

  • rows: grade level
  • columns: opinion categories

To set up the test, you would:

  1. State the type of test: chi-square test for homogeneity.
  2. Define the populations: the three grade levels.
  3. State the variable: opinion about the policy.
  4. Write hypotheses about whether the distribution is the same across grades.
  5. Make sure expected counts can be calculated from the row totals and column totals.

If this were a survey of one group of students recording both grade level and opinion, it would instead be a chi-square test for independence.

Conditions you should always check before starting

Before using a chi-square test, AP Statistics expects you to check conditions. These help make the inference valid.

1. Random condition

The data should come from a random sample, a randomized experiment, or some other design that supports good inference.

  • For a homogeneity test, each group should be randomly sampled or randomly assigned if it is an experiment.
  • For an independence test, the sample should be random from the population.

2. Large expected counts condition

All expected counts should be at least $5$. This condition helps ensure the chi-square distribution is a good approximation.

If any expected count is too small, the chi-square test may not be appropriate.

3. Independence of observations

The observations should be independent. If sampling without replacement from a population, the sample size should be no more than $10\%$ of the population size when applicable.

These conditions are not just busywork; they justify using the chi-square procedure.

How to interpret the setup in context

students, the AP exam wants you to stay in context. That means every hypothesis and conclusion should mention the real situation, not just symbols.

Suppose a county wants to know whether driving preference depends on age group. The categorical variables might be age group and type of driving commute, such as alone, carpool, or public transit.

A correct setup would say:

  • The test is a chi-square test for independence.
  • The null hypothesis is that age group and commute type are independent.
  • The alternative is that they are associated.
  • The expected counts are based on the assumption of independence.
  • If the test is significant, the data provide evidence that commute type and age group are associated.

For a homogeneity example, imagine comparing movie genre preferences across three schools. The setup would say:

  • The test is a chi-square test for homogeneity.
  • The null hypothesis is that the distribution of movie genre preference is the same across the schools.
  • The alternative is that at least one school has a different distribution.

That wording matters because it matches the structure of the study.

Common mistakes students make

Here are some frequent errors to avoid:

  • Mixing up homogeneity and independence.
  • Writing a hypothesis about averages or percentages instead of distributions.
  • Saying the alternative is that the variables are “dependent” instead of associated.
  • Forgetting to mention the real variables and groups in context.
  • Using a chi-square test when expected counts are less than $5$.
  • Treating categories as numbers and trying to find a mean.

A good habit is to ask yourself two questions before proceeding:

  1. Am I comparing distributions across groups?
  2. Or am I checking whether two categorical variables are related in one sample?

Your answer tells you which test to set up.

Why this lesson matters in the bigger chi-square unit

This lesson is the foundation for the rest of chi-square inference. If you cannot set up the test correctly, then the later steps—finding expected counts, computing the test statistic, determining the $p$-value, and writing the conclusion—can all go wrong.

Chi-square tests are part of the larger AP Statistics unit on inference for categorical data. Along with goodness-of-fit tests, they help answer questions about categorical patterns in the real world. Schools use them to study student preferences, companies use them to compare customer choices, and researchers use them to see whether variables are related.

Learning how to set up the test properly helps you communicate clearly and choose the right method for the data. That is one of the most important skills in statistical reasoning.

Conclusion

Setting up a chi-square test for homogeneity or independence is mostly about understanding the story behind the data. students, if you remember the difference between comparing groups and checking for association, you can choose the right test with confidence. Then write the hypotheses in context, organize the data in a two-way table, and check the conditions before moving forward. Once the setup is correct, the rest of the chi-square procedure becomes much easier to follow ✅.

Study Notes

  • Chi-square tests use categorical data and compare observed counts with expected counts.
  • The chi-square test statistic is $\chi^2=\sum \frac{(O-E)^2}{E}$.
  • Use a test for homogeneity when comparing one categorical variable across multiple groups.
  • Use a test for independence when examining the relationship between two categorical variables in one sample.
  • For homogeneity, the null hypothesis says the distributions are the same across groups.
  • For independence, the null hypothesis says the variables are independent.
  • The alternative for independence is that the variables are associated.
  • Expected counts are found by $E=\frac{(\text{row total})(\text{column total})}{\text{grand total}}$.
  • Check the random condition, independence condition, and large expected counts condition.
  • All expected counts should be at least $5$.
  • Always write hypotheses and conclusions in the context of the problem.

Practice Quiz

5 questions to test your understanding