8. Inference for Categorical Data(COLON) Chi-Square

Expected Counts In Two-way Tables

Expected Counts in Two-Way Tables

In AP Statistics, expected counts are one of the most important ideas behind chi-square tests for categorical data 📊. students, if you understand expected counts, you can better understand when a chi-square test is appropriate, what the test is comparing, and why the test works. In a two-way table, expected counts help us answer a simple but powerful question: If there were no relationship between the two categorical variables, what counts would we expect to see in each cell?

Learning goals

By the end of this lesson, students will be able to:

  • explain what expected counts mean in a two-way table,
  • calculate expected counts using row and column totals,
  • connect expected counts to chi-square tests of independence and homogeneity,
  • check whether the expected counts condition is met,
  • use expected counts to decide whether chi-square inference is appropriate.

What Is a Two-Way Table?

A two-way table is a table that summarizes counts for two categorical variables. One variable is listed across the rows, and the other is listed across the columns. For example, a school might record whether students prefer coffee or tea and whether they are freshmen, sophomores, juniors, or seniors.

A two-way table lets us compare groups and look for patterns. Here is the key idea: the table gives observed counts, which are the actual counts collected from the sample or population. Expected counts are different. They are the counts we would expect if the variables were independent.

That phrase matters a lot. In AP Statistics, expected counts are usually used in chi-square tests where the null hypothesis says there is no association between the variables. In other words, knowing one variable does not help predict the other. 😊

What Does “Expected” Mean?

The word “expected” does not mean “guessed” or “most likely based on intuition.” It means a count that would be predicted under the null model.

For a chi-square test of independence, the null hypothesis is:

$$H_0: \text{The two categorical variables are independent.}$$

If the variables are independent, then the distribution of one variable should look the same across the categories of the other variable. The expected counts show what the table would look like if that were true.

Think of it like this: imagine a school survey about favorite lunch choice and grade level. If lunch choice and grade level are independent, then the proportion of students who choose pizza should be about the same in each grade. Expected counts translate that idea into actual numbers for each cell.

How to Calculate Expected Counts

The expected count for a cell in a two-way table is found using row totals and column totals. The formula is:

$$E = \frac{(\text{row total})(\text{column total})}{\text{grand total}}$$

Here, $E$ represents the expected count for one cell.

This formula works because it combines the marginal totals of the table. The row total tells how many cases are in that row category. The column total tells how many cases are in that column category. The grand total is the total number of cases in the table.

Example 1: A simple school survey

Suppose a school surveys students about whether they prefer online learning or in-person learning, and the results are summarized by grade level. Imagine there are $200$ students total.

  • $80$ are freshmen,
  • $120$ are sophomores,
  • $90$ prefer online learning,
  • $110$ prefer in-person learning.

To find the expected count for freshmen who prefer online learning, use:

$$E = \frac{(80)(90)}{200} = 36$$

So if grade level and learning preference were independent, we would expect $36$ freshmen to prefer online learning.

For freshmen who prefer in-person learning:

$$E = \frac{(80)(110)}{200} = 44$$

Notice that the expected counts in the freshman row add to the row total:

$$36 + 44 = 80$$

That will always happen. The expected counts in each row and each column are built so the totals match the margins of the table.

Why Expected Counts Matter in Chi-Square Tests

Expected counts are the foundation of chi-square calculations. The chi-square test compares the observed counts and expected counts to measure how far the data are from what we would expect under the null hypothesis.

The chi-square statistic is:

$$\chi^2 = \sum \frac{(O - E)^2}{E}$$

where $O$ is the observed count and $E$ is the expected count.

If observed counts are very close to expected counts, then $\chi^2$ will be small. If they are far apart, then $\chi^2$ will be large. A large chi-square statistic suggests that the data are not well explained by the null hypothesis of independence.

This is why expected counts are so important. Without them, there is no standard comparison point for judging whether the observed table is unusual. 📚

The Expected Counts Condition

In AP Statistics, not every chi-square table is valid for inference. We need to check the expected counts condition.

A common rule is that all expected counts should be at least $5$:

$$E \ge 5$$

for every cell in the table.

This condition matters because the chi-square approximation works well only when expected counts are not too small. If expected counts are too low, the chi-square model may not be a reliable approximation to the sampling distribution.

Example 2: Checking the condition

Suppose a table has a cell with expected count $3.2$. Since

$$3.2 < 5$$

the condition is not met. In that case, the chi-square test may not be appropriate.

If a table has all expected counts at least $5$, then the condition is satisfied, and the chi-square test can be used if the other conditions are also met.

Expected Counts in Independence and Homogeneity

Expected counts appear in both chi-square tests of independence and homogeneity.

Chi-square test of independence

This test asks whether two categorical variables are associated in one population or one sample.

For example, are music preference and grade level independent among students at one school?

The null hypothesis is:

$$H_0: \text{The variables are independent}$$

Expected counts are computed from the row and column totals of the two-way table.

Chi-square test of homogeneity

This test compares the distribution of one categorical variable across several populations or groups.

For example, does the distribution of favorite study method differ across classes taught by different teachers?

Even though the wording is different, the expected counts are computed in the same way:

$$E = \frac{(\text{row total})(\text{column total})}{\text{grand total}}$$

In both tests, expected counts represent what we would expect if the null hypothesis were true.

Interpreting Expected Counts in Real Life

students, it helps to think about expected counts as a fair-share model. Imagine a sports club with members from different grade levels and a two-way table of whether students play a fall sport. If grade level and sport participation are independent, each grade should receive a fair share of the “play sport” counts based on its size in the club.

For example, if juniors make up $25\%$ of the club and $40$ students play a fall sport, then independence suggests juniors should account for about $0.25 \times 40 = 10$ of those students. That is the same idea as the expected count formula.

Expected counts do not tell us what actually happened. They tell us what would happen if the null hypothesis were true. That makes them a baseline for comparison.

Common Mistakes to Avoid

A few mistakes come up often:

  • Mixing up observed counts and expected counts.
  • Forgetting that expected counts come from the null hypothesis.
  • Using percentages instead of counts in the chi-square formula.
  • Checking only one cell instead of every cell for the expected counts condition.
  • Thinking that a large expected count means the result is significant. Expected counts do not measure significance by themselves; they are part of the comparison.

A good habit is to label the table carefully and show the formula each time you calculate an expected count. This makes your work easier to follow and helps avoid errors.

How Expected Counts Fit into the Bigger Chi-Square Picture

Expected counts are the bridge between the data and the chi-square test. They tell us what the table would look like if the null model were true. Then the chi-square statistic measures how far the observed table is from that model.

In the bigger AP Statistics topic of inference for categorical data, expected counts connect several ideas:

  • two-way tables organize categorical data,
  • null hypotheses describe no association or no difference in distributions,
  • expected counts represent the null model,
  • chi-square statistics compare observed and expected counts,
  • p-values help us decide whether the observed differences are unusual.

So expected counts are not just a small step in the process. They are a central part of understanding why chi-square inference works. ✅

Conclusion

Expected counts in two-way tables are the counts we expect in each cell if the categorical variables are independent or if the null hypothesis of homogeneity is true. They are calculated using row totals, column totals, and the grand total with

$$E = \frac{(\text{row total})(\text{column total})}{\text{grand total}}$$

These counts are used to check the expected counts condition and to compute the chi-square statistic. Understanding expected counts helps students choose the right chi-square procedure and interpret results in a logical way. In AP Statistics, this is a key skill for making sound conclusions from categorical data.

Study Notes

  • A two-way table shows counts for two categorical variables.
  • Observed counts are the actual data values in the table.
  • Expected counts are the counts predicted by the null hypothesis.
  • For chi-square tests, expected counts are found by

$$E = \frac{(\text{row total})(\text{column total})}{\text{grand total}}$$

  • Expected counts assume the variables are independent in a test of independence.
  • In a test of homogeneity, expected counts still come from the same formula.
  • The chi-square statistic is

$$\chi^2 = \sum \frac{(O - E)^2}{E}$$

  • The expected counts condition is usually that every expected count is at least $5$.
  • Small expected counts can make the chi-square approximation unreliable.
  • Expected counts are a key step in deciding whether chi-square inference is appropriate.

Practice Quiz

5 questions to test your understanding

Expected Counts In Two-way Tables — AP Statistics | A-Warded