Introducing Statistics: Are My Results Unexpected? 📊

Imagine students, that you flip a coin 20 times and get 17 heads. Is that just random luck, or is something unusual going on? In statistics, questions like this are the starting point for chi-square inference. We compare what we observed in the data with what we would expect if nothing unusual were happening. If the differences are large enough, we may conclude the results are unexpected.

Objectives for this lesson:

Explain the main ideas and vocabulary behind chi-square inference.
Decide when results look unusual by comparing observed and expected counts.
Connect this lesson to chi-square goodness-of-fit tests, tests of homogeneity, and tests of independence.
Use statistical reasoning to describe whether data are unexpected.

This lesson is the doorway into categorical data inference. The key idea is simple: when outcomes are in categories, we ask whether the counts we see match a pattern we expected. If not, we investigate further.

What does “unexpected” mean in statistics? 🤔

In everyday life, “unexpected” can mean surprising or strange. In statistics, it has a more precise meaning. A result is unexpected when the difference between the observed counts and the expected counts is large enough that chance alone seems like an unlikely explanation.

For example, suppose a school says that about $50\%$ of students prefer sports over clubs. If a survey of $100$ students shows $52$ students prefer sports, that is not very surprising. But if only $20$ students prefer sports, that difference is much bigger and may be unexpected.

The word “unexpected” does not automatically mean impossible or wrong. It means the data do not fit the expected pattern very well. Statisticians use this idea to test whether a claim about a population is reasonable.

The three main chi-square procedures all begin with the same thinking:

Start with a claim or expectation about categorical data.
Find what counts should look like if that claim is true.
Compare expected counts to observed counts.
Decide whether the differences are small enough to be due to random variation.

Observed counts and expected counts

The first important vocabulary is observed count. This is the actual number of individuals or items in each category from the sample data.

The second is expected count. This is the number we would expect in each category if the null hypothesis were true.

A very common formula is:

$$E = np$$

Here, $E$ is the expected count for a category, $n$ is the sample size, and $p$ is the claimed proportion for that category.

For example, suppose a cereal company claims its boxes contain red, blue, green, and yellow candies in proportions $0.25$, $0.25$, $0.25$, and $0.25$. If you buy a bag with $40$ candies, the expected count for each color is:

$$E = 40(0.25) = 10$$

So if you actually count $16$ red candies, that category is higher than expected. If you count only $4$ green candies, that category is lower than expected. The question is not just whether the counts differ, but whether the differences are large enough to be statistically meaningful.

A useful way to think about this is a weather forecast ☁️. If a forecast predicts $70\%$ chance of rain, and it does not rain, that is not shocking. But if the forecast says $1\%$ chance of rain and it pours all day, that is much more unexpected. Statistics works the same way: we compare what should happen under the model with what actually happened.

The chi-square idea: measuring overall difference

Chi-square methods give a way to measure the overall difference between observed and expected counts. The basic chi-square statistic is:

$$\chi^2 = \sum \frac{(O - E)^2}{E}$$

In this formula, $O$ is the observed count and $E$ is the expected count.

This statistic becomes large when observed counts are far from expected counts. It also uses squared differences, so positive and negative differences do not cancel each other out. That matters because one category being too high and another being too low can both be signs that the data do not fit the expectation well.

Notice what chi-square is doing:

If $O$ is close to $E$, then $O - E$ is small.
Squaring makes all differences positive.
Dividing by $E$ helps scale the difference appropriately.
Adding all categories together gives one overall measure of mismatch.

A larger $\chi^2$ value means the data are more unusual under the null hypothesis.

Example: Are the results unexpected?

Suppose a student council claims that students at students’s school choose after-school activities in these proportions:

Sports: $40\%$
Arts: $30\%$
Clubs: $20\%$
Jobs or volunteering: $10\%$

A survey of $100$ students gives these observed counts:

Sports: $50$
Arts: $25$
Clubs: $15$
Jobs or volunteering: $10$

The expected counts are:

Sports: $100(0.40) = 40$
Arts: $100(0.30) = 30$
Clubs: $100(0.20) = 20$
Jobs or volunteering: $100(0.10) = 10$

Now compare observed and expected counts:

Sports is $10$ above expected.
Arts is $5$ below expected.
Clubs is $5$ below expected.
Jobs or volunteering matches expected exactly.

These differences are not huge, but they are not zero either. A chi-square goodness-of-fit test would help decide whether this pattern is just random variation or evidence that the school’s claim is not accurate.

This is the key idea behind the question “Are my results unexpected?” We are not trying to prove that chance cannot explain the data. Instead, we ask whether the data are unusual enough that chance alone seems unlikely.

Where this lesson fits in chi-square inference

This lesson is the big-picture introduction to categorical inference using chi-square methods. The topic includes three major procedures:

1. Chi-square goodness-of-fit test

Use this when you have one categorical variable and want to see whether the distribution of categories matches a claimed distribution.

Example: Is a six-sided die fair? The expected counts should be about equal across all six faces.

2. Chi-square test of homogeneity

Use this when you want to compare the distribution of one categorical variable across multiple populations or groups.

Example: Do students in different grade levels choose the same favorite school lunch at the same rates?

3. Chi-square test of independence

Use this when you want to see whether two categorical variables are related within one population.

Example: Is music preference related to grade level?

All three procedures use the same overall idea: compare observed counts to expected counts and decide whether the differences are surprising.

Selecting the right procedure 🎯

students, one of the most important skills in AP Statistics is choosing the correct test.

Ask these questions:

Am I comparing one categorical variable to a claimed distribution? Then use a goodness-of-fit test.
Am I comparing one categorical variable across different groups or populations? Then use a test of homogeneity.
Am I studying whether two categorical variables are associated in one population? Then use a test of independence.

The phrases in the problem help you decide:

“Does this match the expected distribution?” points to goodness-of-fit.
“Do several groups have the same distribution?” points to homogeneity.
“Are these two variables related?” points to independence.

At this introduction stage, the main goal is not to memorize formulas alone. It is to understand what question the data are answering.

Expected counts and why they matter

Expected counts are important because chi-square methods rely on comparing what was observed with what should happen if the null hypothesis were true. If expected counts are too small, the chi-square approximation may not work well.

A common AP Statistics guideline is that expected counts should generally be at least $5$ in each category for chi-square procedures to be appropriate.

This matters because chi-square methods are based on an approximate probability model. If expected counts are tiny, the approximation can be inaccurate, and the test results may not be trustworthy.

So before using a chi-square procedure, statisticians check whether the expected counts condition is met.

Big picture connection to AP Statistics

This lesson helps build the reasoning needed for the rest of categorical inference. Later, students will use chi-square tests more formally by following a structure:

State the hypotheses.
Check conditions.
Compute expected counts.
Find the chi-square statistic.
Use the result to judge whether the observed data are unexpected.
Make a conclusion in context.

That structure appears again and again in AP Statistics. The lesson you are learning now focuses on the first important question: “Do the data look unusual compared with what we expected?” Once that question is clear, the rest of the chi-square topic makes much more sense.

Conclusion

The main idea of this lesson is that statistical inference for categorical data begins by comparing observed counts with expected counts. When the differences are small, the data may fit the expected pattern well. When the differences are large, the results may be unexpected and worth investigating further.

Chi-square methods help answer that question for one categorical variable, for comparisons across groups, and for relationships between two categorical variables. As you continue in AP Statistics, keep asking: What am I expecting? What did I observe? And are the differences large enough to matter? 🌟

Study Notes

Observed counts are the actual counts in the data.
Expected counts are the counts predicted by the null hypothesis.
A common expected count formula is $E = np$.
The chi-square statistic is $\chi^2 = \sum \frac{(O - E)^2}{E}$.
Large values of $\chi^2$ suggest the observed data are far from expected.
Chi-square goodness-of-fit tests compare one categorical variable to a claimed distribution.
Chi-square tests of homogeneity compare distributions across multiple groups.
Chi-square tests of independence examine whether two categorical variables are associated.
Expected counts should generally be at least $5$ for chi-square procedures.
The central question is whether the results are unexpected enough to suggest the null hypothesis may not be true.