2. Exploring Two-Variable Data

Representing Two Categorical Variables

Representing Two Categorical Variables

students, imagine you want to know whether students at your school prefer basketball or soccer, and whether they are in 9th, 10th, 11th, or 12th grade. 🏀⚽ Those are two categorical variables because each variable places people into groups, not numbers. In AP Statistics, representing two categorical variables helps us organize data, compare groups, and look for possible relationships between categories.

In this lesson, you will learn how to represent two categorical variables using tables and graphs, how to describe what those displays show, and how to think carefully about association without confusing it with causation. By the end, you should be able to explain the main vocabulary, interpret a two-way table, and connect this topic to the bigger unit of exploring two-variable data.

What Counts as Two Categorical Variables?

A categorical variable puts individuals into labels or categories. Examples include eye color, type of phone, grade level, favorite sport, or whether someone owns a pet. When a data set includes two categorical variables for each individual, we want to see how the categories of one variable relate to the categories of the other.

For example, suppose a school surveys students about their grade level and whether they have a part-time job. Grade level has categories like $9$th, $10$th, $11$th, and $12$th. Job status has categories like yes or no. These are both categorical, so a good way to organize the data is with a two-way table.

A two-way table is a table that displays counts or percentages for two categorical variables at the same time. It helps us answer questions like these:

  • Do older students work more often than younger students?
  • Is there a difference in preference between boys and girls?
  • Are certain categories more common together than others?

This kind of question is important because statistics is not only about collecting data, but also about making sense of patterns in data. 📊

The Two-Way Table: The Main Tool

A two-way table organizes data by rows and columns. One variable labels the rows, and the other variable labels the columns. Each cell in the table shows how many individuals fall into that combination of categories.

Suppose a survey of $100$ students gives the following counts for grade level and part-time job status:

$$

$\begin{array}{c|cc|c}$

& \text{Has Job} & \text{No Job} & $\text{Total}$ \\

$\hline$

$\text{9th Grade}$ & 8 & 17 & 25 \\

$\text{10th Grade}$ & 10 & 15 & 25 \\

$\text{11th Grade}$ & 18 & 7 & 25 \\

$\text{12th Grade}$ & 22 & 3 & 25 \\

$\hline$

$\text{Total}$ & 58 & 42 & 100

$\end{array}$

$$

This table gives a lot of information quickly. For example, students, you can see that $22$ twelfth graders have jobs, and that the total number of students with jobs is $58$.

Two important ideas are row totals and column totals. Row totals show how many individuals are in each row category. Column totals show how many are in each column category. The grand total is the total number of individuals in the study.

Reading the Table Correctly

When you interpret a two-way table, always pay attention to what is being counted. A count in a cell is a frequency, not a proportion. For example, in the table above, the value $18$ means there are $18$ students who are in $11$th grade and have a job.

A common mistake is to mix up joint frequency, marginal frequency, and conditional frequency.

  • Joint frequency: the count in a cell for a combination of categories
  • Marginal frequency: the count in the row total or column total
  • Conditional frequency: the count or percent within one category, given another category

For example, the joint frequency for $11$th graders with jobs is $18$. The marginal frequency for all students with jobs is $58$. The conditional frequency of job status among $11$th graders is the proportion $\frac{18}{25}$, which equals $0.72$, or $72\%$.

Relative Frequency Tables and Why They Matter

Counts are useful, but percentages often make comparisons easier. A relative frequency table replaces counts with proportions or percents. This is especially helpful when groups have different sizes.

To compare job status across grade levels, we can find the row percentages for each grade. Since each row total is $25$, the job rates are:

$$

$\frac{8}{25}$=0.32,\quad $\frac{10}{25}$=0.40,\quad $\frac{18}{25}$=0.72,\quad $\frac{22}{25}$=0.88

$$

These percentages show a clear pattern: job participation increases with grade level. That pattern suggests an association between grade level and having a part-time job.

A conditional relative frequency table is a table of percentages computed within rows or within columns. In AP Statistics, these tables are powerful because they let you compare distributions across groups.

For example, if you want to know whether the distribution of job status changes across grade levels, you might compute the conditional percentages within each grade. If you want to know whether the distribution of grade levels changes depending on job status, you could compute percentages within each job category instead.

Remember this key idea, students: the direction of the condition matters. A percent is always “out of” something. If you do not know the base group, the percentage can be misleading.

Graphs for Two Categorical Variables

Two-way tables are not the only way to represent two categorical variables. Graphs can make patterns easier to see. The most common graphs are segmented bar graphs, side-by-side bar graphs, and mosaic plots.

Segmented Bar Graphs

A segmented bar graph shows each category as a bar divided into segments for the other variable. Each bar has the same height because the bars represent $100\%$ of a group. This makes it easy to compare conditional percentages.

For the job example, each grade level could have a bar that shows what percent has a job and what percent does not. The bars would help you see that the percentage with jobs increases from $9$th grade to $12$th grade.

Side-by-Side Bar Graphs

A side-by-side bar graph places bars for one variable next to each other for each category of the other variable. This is useful for comparing counts or proportions across groups, although it can be less direct than a segmented bar graph for percent comparisons.

Mosaic Plots

A mosaic plot uses rectangles whose areas represent frequencies. It is especially useful for showing the relationship between two categorical variables because the size of each block reflects how common that combination is.

All of these displays are ways to help you see whether categories seem independent or associated.

Association, Independence, and Interpretation

Two categorical variables are associated if the distribution of one variable changes across the categories of the other variable. They are independent if knowing the category of one variable does not change the distribution of the other.

Let’s use the job example. If job status were independent of grade level, then the percent of students with jobs would be about the same in every grade. But the percentages $32\%$, $40\%$, $72\%$, and $88\%$ are clearly different. That means grade level and job status are associated.

When you describe association, you should write in plain language and use numbers as evidence. For example:

“Students in higher grades are more likely to have a part-time job. The percent with jobs rises from $32\%$ in $9$th grade to $88\%$ in $12$th grade.”

That is a strong AP Statistics-style conclusion because it is specific, data-based, and easy to understand.

Do not say that one variable causes the other unless a randomized experiment supports that claim. In a survey, association does not prove causation. For example, grade level may be related to job status, but grade level does not necessarily cause someone to get a job. Other factors may be involved, such as age, transportation, family expectations, or local job availability.

Why This Topic Matters in Exploring Two-Variable Data

Exploring Two-Variable Data includes both categorical and quantitative relationships. In this lesson, the focus is on two categorical variables. Later, you will study scatterplots, correlation, regression lines, residuals, and departures from linearity for quantitative data.

The big connection is that AP Statistics always asks the same kind of question: how does one variable relate to another? When the variables are categorical, you use two-way tables and categorical graphs. When the variables are quantitative, you use scatterplots and numerical summaries such as correlation.

So this lesson is a foundation. It teaches you how to organize and interpret data before moving on to numerical relationships. Understanding two-way tables also helps in other AP Statistics topics, such as probability and inference, because many probability questions use tables of categories.

How to Think Like an AP Statistics Student

When you see two categorical variables, follow a careful process:

  1. Identify both variables and confirm that both are categorical.
  2. Choose a display, usually a two-way table.
  3. Look at counts first, then percentages.
  4. Compare conditional distributions, not just totals.
  5. Describe the association using evidence.
  6. Avoid causal language unless the study design supports it.

For example, suppose a survey asked whether students prefer studying alone or with friends, and whether they are morning people or night people. You could build a two-way table, compute row or column percentages, and ask whether preferences differ by sleep pattern. That is the kind of reasoning AP Statistics expects.

Conclusion

Representing two categorical variables is a core skill in AP Statistics. It helps you organize data, compare groups, and describe relationships using evidence. Two-way tables, relative frequency tables, segmented bar graphs, side-by-side bar graphs, and mosaic plots all help reveal whether categories are associated.

students, the most important idea to remember is that differences in conditional percentages show how one categorical variable relates to another. If the distributions change across groups, the variables are associated; if they stay about the same, they may be independent. This topic connects directly to the broader study of two-variable data and prepares you for more advanced statistical thinking. ✅

Study Notes

  • A categorical variable places individuals into groups or labels.
  • Two categorical variables are often represented with a two-way table.
  • A two-way table shows joint frequencies, marginal frequencies, and conditional frequencies.
  • Conditional relative frequency tables help compare groups fairly.
  • Segmented bar graphs, side-by-side bar graphs, and mosaic plots are useful displays.
  • Association means the distribution of one variable changes across the categories of the other variable.
  • Independence means the distribution of one variable does not change across the categories of the other variable.
  • Correlation is for quantitative variables, not categorical variables.
  • Association does not imply causation.
  • In AP Statistics, always support conclusions with numbers or percentages from the data.

Practice Quiz

5 questions to test your understanding