Box Plots 📦📊
Introduction
students, imagine you are comparing the test scores of two classes, or the delivery times of two food apps. You do not just want the average. You want to know which group is more spread out, which one has unusual values, and where the middle of the data really sits. That is exactly where box plots help. Box plots, also called box-and-whisker plots, give a compact visual summary of a data set using five key values: the minimum, lower quartile, median, upper quartile, and maximum.
In this lesson, you will learn how to read and interpret box plots, how to build one from data, and how to use it to compare distributions in real situations. By the end, you should be able to explain what a box plot tells us about center, spread, symmetry, and outliers in a way that fits IB Mathematics: Applications and Interpretation SL. ✅
Understanding the five-number summary
A box plot is built from the five-number summary:
- Minimum value
- Lower quartile, written as $Q_1$
- Median, written as $Q_2$
- Upper quartile, written as $Q_3$
- Maximum value
These values divide the data into four parts. The median is the middle value when the data are ordered from smallest to largest. The lower quartile $Q_1$ is the median of the lower half of the data, and the upper quartile $Q_3$ is the median of the upper half.
The distance between $Q_1$ and $Q_3$ is called the interquartile range, written as $\text{IQR}$:
$$\text{IQR} = Q_3 - Q_1$$
This is an important measure of spread because it tells us how wide the middle 50% of the data is. Unlike the range, the interquartile range is less affected by extreme values.
For example, suppose the sorted data are $3, 4, 5, 7, 8, 9, 10, 12, 15.
- Minimum $= 3$
- Median $= 8$
- Lower half is $3, 4, 5, 7
- Upper half is $9, 10, 12, 15
- $Q_1 = \frac{4+5}{2} = 4.5$
- $Q_3 = \frac{10+12}{2} = 11$
- Maximum $= 15$
So the five-number summary is $3, 4.5, 8, 11, 15.
How a box plot is drawn
A box plot uses a number line. The box goes from $Q_1$ to $Q_3$, and a line inside the box shows the median. The whiskers extend from the box to the minimum and maximum values, unless the data contain outliers and a modified box plot is being used.
The shape of the box plot helps you understand the data quickly:
- A longer box means a larger $\text{IQR}$, so the middle 50% of the data is more spread out.
- A median line closer to one side of the box suggests skewness.
- Long whiskers suggest more spread in the outer parts of the data.
Real-world example: suppose two athletes complete five training runs each, and their times are summarized by box plots. If Athlete A has a smaller $\text{IQR}$ than Athlete B, then Athlete A’s middle performances are more consistent. If Athlete B has a lower median time, then B is generally faster, even if B is less consistent. This shows why box plots are useful for comparing both center and spread. 🏃
Interpreting box plots carefully
When reading a box plot, students, always focus on what it tells you about the data, not just the picture itself. The median shows the center of the data, while the box and whiskers show spread.
Here are important interpretation ideas:
- The median is a measure of typical value.
- The $\text{IQR}$ measures spread in the middle half of the data.
- The range, given by $\text{range} = \text{maximum} - \text{minimum}$, measures total spread.
- A larger $\text{IQR}$ means more variation in the middle of the data.
- A larger range may be caused by one unusual value.
Suppose two towns have daily temperatures summarized by box plots. Town X has a median of $22^\circ\text{C}$ and a small $\text{IQR}$, while Town Y has the same median but a larger $\text{IQR}$. This means the typical temperature is similar in both towns, but Town Y’s temperatures fluctuate more during the middle of the data.
A box plot can also show symmetry. If the median is near the center of the box and the whiskers are about the same length, the distribution is roughly symmetric. If the median is closer to one quartile and one whisker is much longer, the distribution is skewed.
Outliers and modified box plots
A standard box plot usually shows the minimum and maximum values directly. However, in many statistical analyses, extreme values are treated as outliers and are plotted separately. This is called a modified box plot.
A common rule for detecting outliers is the $1.5\times\text{IQR}$ rule:
$$\text{Lower fence} = Q_1 - 1.5\times \text{IQR}$$
$$\text{Upper fence} = Q_3 + 1.5\times \text{IQR}$$
Any data value below the lower fence or above the upper fence may be considered an outlier.
Example: if $Q_1 = 10$ and $Q_3 = 18$, then
$$\text{IQR} = 18 - 10 = 8$$
The fences are:
$$10 - 1.5\times 8 = -2$$
$$18 + 1.5\times 8 = 30$$
So any value less than $-2$ or greater than $30$ is an outlier.
Why does this matter? Because outliers can distort the mean and range. A box plot helps you see them quickly. For example, if a class test score list includes one score of $2$ while all others are between $60$ and $95$, the box plot may show that $2$ as an unusual value. This helps a teacher decide whether the data needs further checking or whether the low score was due to an exceptional situation.
Comparing two or more box plots
In IB Mathematics: Applications and Interpretation SL, you often compare distributions from two groups. The goal is to make a clear statistical statement using evidence.
When comparing box plots, you should comment on:
- Median: which group has the higher typical value?
- $\text{IQR}$: which group is more consistent?
- Range: which group has the greater overall spread?
- Shape: is one distribution skewed?
- Overlap: do the groups look similar or quite different?
Example: consider two mobile phone battery lives.
- Model A: median $= 11$ hours, $\text{IQR} = 2$ hours
- Model B: median $= 12$ hours, $\text{IQR} = 5$ hours
Model B has a higher typical battery life, but Model A is more consistent. If a student wants reliability, Model A may be better. If a student wants the longest average duration and can accept variation, Model B might be preferred.
This type of reasoning is important in real-life decisions. Sports teams, businesses, and scientists all use box plots to compare groups because the graphs are quick to read and easy to interpret. 📈
Box plots in the wider study of statistics and probability
Box plots belong to data analysis and interpretation, one of the major parts of statistics. They help summarize sample data and support comparisons between groups. In IB work, box plots often appear alongside histograms, frequency tables, and measures such as mean and standard deviation.
Box plots are especially useful when:
- You want a summary rather than every individual value
- You need to compare more than one data set
- You want to check for skewness or outliers
- You need a graph that is easy to draw and interpret
They are less detailed than dot plots or histograms, because they do not show every value. However, that simplicity is a strength when you need a clear overview.
Box plots also connect to probability and inference. When analyzing samples, you might use box plots to describe the spread of the sample before making decisions about a larger population. For example, if a company tests two production lines and one line shows a smaller $\text{IQR}$ in product weight, the company may infer that this line is more consistent. Box plots do not give probability directly, but they support evidence-based judgment, which is a key part of inferential reasoning.
How to create a box plot from data
To make a box plot, follow these steps:
- Order the data from least to greatest.
- Find the median.
- Find $Q_1$ and $Q_3$.
- Find the minimum and maximum values.
- Draw a number line.
- Mark the five-number summary.
- Draw the box from $Q_1$ to $Q_3$ and the median line inside it.
- Extend the whiskers to the minimum and maximum, or to the non-outlier values if using a modified box plot.
Example data: $6, 8, 9, 10, 12, 13, 15, 18
- Median $= \frac{10+12}{2} = 11$
- Lower half: $6, 8, 9, 10
- Upper half: $12, 13, 15, 18
- $Q_1 = \frac{8+9}{2} = 8.5$
- $Q_3 = \frac{13+15}{2} = 14$
- Minimum $= 6$
- Maximum $= 18$
The box plot would have a box from $8.5$ to $14$, a median at $11$, and whiskers to $6$ and $18$. Since the right side of the box is slightly longer and the upper whisker is longer than the lower whisker, the distribution is slightly right-skewed.
Conclusion
Box plots are a powerful way to summarize and compare data. They show center through the median, spread through the $\text{IQR}$ and range, and unusual values through outlier rules. In IB Mathematics: Applications and Interpretation SL, box plots help you interpret data in a practical, evidence-based way. Whether you are comparing exam scores, delivery times, sports results, or product measurements, box plots give a clear picture of what the data say. students, if you can read and explain a box plot well, you are building a strong foundation for statistics and probability. ✅
Study Notes
- A box plot summarizes data using the five-number summary: minimum, $Q_1$, median, $Q_3$, and maximum.
- The interquartile range is $\text{IQR} = Q_3 - Q_1$.
- The box shows the middle 50% of the data.
- The line inside the box shows the median.
- Whiskers usually extend to the minimum and maximum values.
- A modified box plot may show outliers separately.
- Outliers can be identified using $Q_1 - 1.5\times \text{IQR}$ and $Q_3 + 1.5\times \text{IQR}$.
- Box plots help compare center, spread, skewness, and outliers between groups.
- A larger $\text{IQR}$ means more spread in the middle half of the data.
- A box plot is useful for real-world decisions because it gives a quick statistical summary.
