24. Lesson 4(DOT)4(COLON) Box plots and comparing distributions

Lesson Focus

Official syllabus section covering Lesson focus within Lesson 4.4: Box plots and comparing distributions: The five-number summary and constructing a box-and-whisker plot.; Marking outliers on a box plot..

Lesson 4.4: Box Plots and Comparing Distributions

Introduction

Welcome to Lesson 4.4 of Foundation Statistics! πŸŽ‰ In this lesson, we're going to explore box plots and how they can help us compare different distributions of data. By the end of this lesson, you will be able to:

  • Understand the five-number summary and construct a box-and-whisker plot.
  • Identify and mark outliers on a box plot.
  • Compare two or more groups using parallel box plots.
  • Write clear comparisons of center, spread, and shape for different data sets.
  • Use the terminology associated with box plots effectively.

Imagine you have two different classes, and you want to compare their test scores. How would you determine which class performed better? Box plots can visually help you analyze this! πŸ“Š Let's dive into the world of box plots.

The Five-Number Summary and Constructing Box-and-Whisker Plots

A box-and-whisker plot (or box plot) is a great way to summarize a set of data values. It displays the five-number summary, which consists of:

  1. Minimum value
  2. First quartile (Q1): the median of the lower half of the data
  3. Median (Q2): the middle value of the data
  4. Third quartile (Q3): the median of the upper half of the data
  5. Maximum value

Imagine we have the following set of test scores from a class:

| Scores |

|--------|

| 55 |

| 60 |

| 61 |

| 62 |

| 68 |

| 70 |

| 70 |

| 74 |

| 80 |

| 90 |

Let's calculate the five-number summary:

  • Minimum: $55$
  • Q1: $61$ (Median of 55, 60, 61, 62, 68)
  • Median (Q2): $68$
  • Q3: $74$ (Median of 70, 70, 74, 80, 90)
  • Maximum: $90$

Now, we can construct the box plot. First, draw a number line that includes the range of your data. Then:

  • Find and mark the minimum and maximum values.
  • Draw a box from Q1 to Q3, and mark the median inside the box.
  • Extend "whiskers" from the box to the minimum and maximum values to complete it.

This box plot clearly shows us where most of the scores lie, giving us insights about the distribution of scores. πŸ“ˆ

Marking Outliers on a Box Plot

Outliers are data points that are significantly different from other observations. In box plots, we can identify outliers by using the interquartile range (IQR), which is calculated as:

$$ IQR = Q3 - Q1 $$

We define an outlier as any point that is:

  • Lower than $Q1 - 1.5 \cdot IQR$ or
  • Higher than $Q3 + 1.5 \cdot IQR$

Let’s continue with our previous example:

  • We found $Q1 = 61$, $Q3 = 74$, and now we calculate $IQR$:

$$ IQR = 74 - 61 = 13 $$

Now, let's figure out the outlier boundaries:

  • Lower Boundary: $61 - 1.5 \cdot 13 = 61 - 19.5 = 41.5$
  • Upper Boundary: $74 + 1.5 \cdot 13 = 74 + 19.5 = 93.5$

Since all our test scores fall between $55$ and $90$, we have no outliers in this dataset. However, if we had a score of $95$, it would be considered an outlier and would be marked on the box plot as a distinct point. ⚠️

Comparing Two or More Groups with Parallel Box Plots

When we have multiple groups of data, box plots can help us compare them side by side. Let's look at another example:

  • Class A test scores: $55, 60, 62, 68, 70, 74, 80, 90
  • Class B test scores: $50, 65, 70, 75, 80, 85, 92, 95

To compare the two classes:

  1. Calculate the five-number summary for both classes.
  2. Construct box plots for both classes on the same graph.
  3. Analyze the differences in shape, center, and spread.

Here's how we can summarize our findings:

  • Center: The median of Class A is approximately $70$, while Class B’s is around $80$.
  • Spread: Class A has scores ranging from $55$ to $90$, whereas Class B's scores range from $50$ to $95$.
  • Shape: Observing the lengths of the whiskers and the box sizes will indicate how data is spread out.

From this comparison, we can say which class performed better or which had more variability in their scores! πŸ”

Conclusion

In this lesson, we explored box plots and their importance in understanding distributions. We learned how to construct them from the five-number summary and identify outliers. Plus, we discovered how to compare multiple groups visually! Box plots are a powerful tool that can help you in analyzing data effectively.

Study Notes

  • Box plots summarize data using a five-number summary: minimum, Q1, median, Q3, and maximum.
  • The interquartile range (IQR) is crucial for finding and marking outliers in a dataset.
  • Parallel box plots allow for easy comparison between multiple groups.
  • When comparing distributions, consider center (median), spread (range and IQR), and shape (box and whisker length).
  • Familiarize yourself with the terminology: minimum, quartiles, median, outliers, and box plots.

Practice Quiz

5 questions to test your understanding