Lesson 4.4: Choosing and Comparing Averages

Introduction

In this lesson, we will explore how to choose and compare averages effectively. Averages, including the mean, median, and mode, serve as summary statistics that provide insight into a dataset. Understanding how these averages can yield different results under varying conditions—such as skewness and the presence of outliers—is crucial for interpreting data accurately. By the end of this lesson, students will be able to explain the differences between the mean, median, and mode, choose the most appropriate average for a specific dataset, and recognize potential pitfalls in relying solely on averages.

Learning Objectives

Understand how skew and outliers affect the mean, median, and mode.
Choose the average that best represents a given dataset and purpose.
Recognize how an average alone can mislead without consideration of data spread.
Compare two groups by their typical values.
Justify the choice of mean, median, or mode for a particular dataset.

Understanding Averages

An average is a single value that represents a set of values or data points. The three most common types of averages are the mean, median, and mode.

The Mean

The mean, often referred to as the arithmetic average, is calculated by summing all data points and dividing by the number of points:

$$\text{Mean} = \frac{\sum_{i=1}^{n} x_i}{n}$$

where $x_i$ represents each data point and $n$ is the number of data points.

Example of Mean Calculation

Consider the following dataset representing the ages of a group of five people: 20, 22, 23, 30, and 25.

Sum of ages: $20 + 22 + 23 + 30 + 25 = 120$.
Number of ages: $5$.
Mean: $\text{Mean} = \frac{120}{5} = 24$.

Thus, the mean age is 24 years.

The Median

The median is the middle value of a dataset when arranged in ascending order. In the case of an even number of observations, the median is the average of the two middle numbers.

Example of Median Calculation

Using the same dataset: 20, 22, 23, 30, and 25. First, arrange in ascending order:

20, 22, 23, 25, 30.

Since there are five data points (odd), the median is the third value:

$$\text{Median} = 23.$$

If we had an even number of values, say 20, 22, 23, and 30, the calculation would be:

Order the data: 20, 22, 23, 30.
Average the two middle values: $22$ and $23$.
Median: $$\text{Median} = \frac{22 + 23}{2} = 22.5.$$

The Mode

The mode is the value that appears most frequently in a dataset. A dataset may have multiple modes (bimodal or multimodal) or no mode at all if all values are unique.

Example of Mode Calculation

Consider the following dataset of shoe sizes: 7, 8, 8, 9, 10.

The number 8 appears most frequently.
Mode: $$\text{Mode} = 8.$$

Now consider another dataset: 7, 8, 9, 10, and 11. Here, each value occurs only once, so there is no mode.

How Skewness Affects Averages

Skewness refers to the asymmetry in the distribution of data. A dataset can be skewed left (negatively) or right (positively), which significantly affects the mean, median, and mode.

Left Skew (Negative Skew)

In a left-skewed distribution, the mean is less than the median, which is less than the mode:

$$\text{Mean} < \text{Median} < \text{Mode}$$

Example of Left Skew

Consider the ages: 20, 22, 23, 25, 39. The outlier age of 39 pulls the mean down:

Mean: $\text{Mean} = \frac{20 + 22 + 23 + 25 + 39}{5} = \frac{129}{5} = 25.8$.
Median: 23.
Mode: None (all ages are unique).

In this scenario, the mean age misleadingly suggests that the central tendency is higher than most values.

Right Skew (Positive Skew)

In a right-skewed distribution, the mean is greater than the median and the median is greater than the mode:

$$\text{Mode} < \text{Median} < \text{Mean}$$

Example of Right Skew

Consider the incomes: 30,000, 32,000, 35,000, 40,000, 100,000. The outlier income of 100,000 pulls the mean up:

Mean: $\text{Mean} = \frac{30,000 + 32,000 + 35,000 + 40,000 + 100,000}{5} = \frac{237,000}{5} = 47,400$.
Median: 35,000.
Mode: None (all values are unique).

Here, the mean suggests a higher typical income than is actually the case for the majority of the data.

Outliers and Their Impact on Averages

An outlier is an extreme value that significantly differs from the other observations in a dataset. Outliers can skew the mean considerably but have a minimal impact on the median, as it depends only on the middle value(s).

Example of Impact of an Outlier

Consider the heights of a group of six people: 5.5 ft, 5.6 ft, 5.7 ft, 5.8 ft, 6.0 ft, and 8.0 ft (where 8.0 ft is an outlier).

Mean: $\text{Mean} = \frac{5.5 + 5.6 + 5.7 + 5.8 + 6.0 + 8.0}{6} = \frac{37.6}{6} = 6.27 \text{ ft}$.
Median: With values arranged as 5.5, 5.6, 5.7, 5.8, 6.0, 8.0 the median (average of 5.7 and 5.8) is $\text{Median} = \frac{5.7 + 5.8}{2} = 5.75 \text{ ft}$.
Mode: None (all values are unique).

This example demonstrates how the presence of the outlier (8.0 ft) raises the mean while the median remains unaffected, providing a more accurate reflection of the general height in the group.

Choosing the Right Average

Selecting the appropriate average depends on the dataset’s characteristics and the context of the analysis. Here are a few guidelines:

Use the Mean when the data is symmetrically distributed without outliers.
Use the Median in skewed distributions or when outliers are present, as it better represents the data’s center.
Use the Mode for categorical data or to highlight the most common value.

Comparing Two Groups

When comparing two groups, it is essential to calculate and analyze the averages of both groups to make informed judgments.

Example: Comparing Test Scores

Consider two classes with their respective test scores:

Class A: 70, 72, 73, 75, 80.
Class B: 60, 95, 100, 100, 100.

Calculate the mean:

Class A: $\text{Mean} = \frac{70 + 72 + 73 + 75 + 80}{5} = 74$.
Class B: $\text{Mean} = \frac{60 + 95 + 100 + 100 + 100}{5} = 91$.

Calculate the median:

Class A: Median is 73.
Class B: Median is 100.

Compare the modes:

Class A: Mode is None.
Class B: Mode is 100.

Here, although Class A has a lower mean, Class B’s median and mode show that the majority of students excelled. This comparison demonstrates the importance of context when interpreting averages.

Conclusion

Choosing the right average is critical for accurately representing data and making valid comparisons between datasets. students has learned how skew, outliers, and the nature of the data can influence the mean, median, and mode. It's not only about calculating these averages but also understanding their implications. As we move forward, remember to consider both the average and the spread of data for a complete picture.

Study Notes

Averages include mean, median, and mode, each serving different purposes in data representation.
The mean is sensitive to outliers, while the median remains stable.
Skewness in data can significantly distort average values.
The appropriateness of an average depends on data characteristics and the analysis context.
Not considering data spread can lead to misleading conclusions about distribution.