2. Topic 2(COLON) Organising Data in Tables

Lesson 2.4: Two-way Tables

Official syllabus section covering Lesson 2.4: Two-way tables within Topic 2: Organising Data in Tables: Cross-classifying two categorical variables in a two-way (contingency) table.; Row totals, column totals and the grand total (the margins of the table)..

Lesson 2.4: Two-way Tables

Introduction

In this lesson, we will explore two-way tables, a powerful tool for organizing and analyzing data that involves two categorical variables. Understanding how to create and interpret these tables is essential for any budding statistician. The ability to cross-classify data allows us to discern patterns, relationships, and significant trends that might otherwise go unnoticed in raw data.

Learning Objectives

By the end of this lesson, you will be able to:

  • Cross-classify two categorical variables in a two-way (contingency) table.
  • Calculate row totals, column totals, and the grand total.
  • Extract counts for various combinations of categories from the table.
  • Compare different groups by examining the table.
  • Create a two-way table from raw data classified by two variables.

Understanding Two-Way Tables

A two-way table (also called a contingency table) is used to summarize the relationship between two categorical variables. Each cell in the table represents a particular combination of categories from the two variables.

Basic Structure

A two-way table looks like this:

Category 1Category 2Total
Row 1Count A1Count A2Total Row 1
Row 2Count B1Count B2Total Row 2
TotalTotal Col 1Total Col 2Grand Total

In this structure:

  • Each row corresponds to one category of the first variable.
  • Each column corresponds to one category of the second variable.
  • The cells contain counts of occurrences for each combination of the two categories.
  • Marginal totals (sums) are included on the right and bottom.

Example 1: Creating a Two-way Table

Suppose we collected data on students’ favorite subjects and their associated gender. Here’s how our data might look:

GenderMathScienceEnglish
Male152010
Female201525

Let's summarize this data in a two-way table.

Step 1: List the Categories

  • Rows: Male, Female
  • Columns: Math, Science, English

Step 2: Populate the Table

Using our data, the two-way table will be:

MathScienceEnglishTotal
Male15201045
Female20152560
Total353535105

Step 3: Calculate the Marginal Totals

  • Row Totals: Sum counts for each row (Male: $15 + 20 + 10 = 45$, Female: $20 + 15 + 25 = 60$)
  • Column Totals: Sum counts for each column (Math: $15 + 20 = 35$, Science: $20 + 15 = 35$, English: $10 + 25 = 35$)
  • Grand Total: Sum of all counts ($45 + 60 = 105$)

Reading and Interpreting Two-Way Tables

Once the two-way table is created, we can now read and interpret the data.

Cell Values

Each cell in the table provides the count of occurrences for each combination of categories. For instance, from our table:

  • The number of male students who prefer Math is 15.
  • The number of female students who prefer English is 25.

Row and Column Comparisons

Two-way tables facilitate comparisons within and between groups:

  • Across Rows: To compare favorite subjects of male and female students, check each row. For instance, in Math, more females prefer it compared to males (20 vs. 15).
  • Down Columns: To compare gender preference for a specific subject. In English, females prefer it more (25) compared to males (10).

Example 2: Interpreting Data

Given our previous table, we can also ask questions such as:

  • What proportion of students are male who prefer Math?

To calculate this, divide the number of males who prefer Math by the total number of students:

$$\text{Proportion} = \frac{\text{Male Math}}{\text{Grand Total}} = \frac{15}{105} = \frac{1}{7} \approx 0.14$$

This tells us that roughly 14% of all students prefer Math, given they are male.

Common Misconceptions

Misconception 1: All categories must appear in every row.

This is incorrect; only the counts of occurrences for each combination matter. It is legitimate for some cells to have a count of 0.

Misconception 2: The totals are unnecessary.

The marginal totals (row, column, and grand totals) provide essential information that helps in understanding the data’s distribution.

Conclusion

In this lesson, we have learned how to create and interpret two-way tables, focusing on cross-classifying two categorical variables. Understanding these tables is crucial for analyzing relationships and patterns in data.

Study Notes

  • A two-way table summarizes the relationship between two categorical variables.
  • Rows and columns represent categories of the two variables.
  • Each cell contains a count of occurrences for the respective combination.
  • Marginal totals aid in understanding the overall distribution.
  • Use two-way tables for comparisons and deriving proportions.
  • Important to recognize and avoid common misconceptions for accurate interpretation.

Practice Quiz

5 questions to test your understanding