4. Statistics and Probability

Box Plots

Box Plots 📦📊

students, in statistics we often need to compare groups of data quickly and clearly. A box plot is one of the best tools for that job. It turns a list of numbers into a simple picture that shows the centre, spread, and possible outliers of the data. In IB Mathematics: Applications and Interpretation HL, box plots are important because they help you interpret real-world data, compare distributions, and make evidence-based decisions.

Learning objectives

By the end of this lesson, students, you should be able to:

  • explain the key ideas and terminology behind box plots;
  • construct and interpret a box plot from data;
  • compare two or more box plots using statistical language;
  • connect box plots to broader ideas in statistics and probability;
  • use box plots as evidence in real-world contexts.

Imagine two classrooms taking the same test. One class has scores packed tightly together, while the other has scores spread out across a wide range. A box plot can show that difference in a very small amount of space. That is why box plots are so useful when data needs to be compared fast and accurately 👀

What a Box Plot Shows

A box plot is built from five important values called the five-number summary:

  • minimum,
  • lower quartile $Q_1$,
  • median $Q_2$,
  • upper quartile $Q_3$,
  • maximum.

These values describe the shape and spread of a data set.

The median is the middle value when the data are ordered. If there is an even number of values, the median is the average of the two middle numbers. The quartiles split the data into four equal parts:

  • $Q_1$ is the median of the lower half,
  • $Q_2$ is the median of the whole data set,
  • $Q_3$ is the median of the upper half.

The interquartile range is written as $\text{IQR} = Q_3 - Q_1$. It measures the spread of the middle $50\%$ of the data. This is useful because it is less affected by extreme values than the full range.

A box plot usually has:

  • a box from $Q_1$ to $Q_3$,
  • a line inside the box at the median $Q_2$,
  • whiskers extending to the minimum and maximum, or to the furthest non-outlier values depending on the convention used.

For IB, you should always check whether the box plot uses the full minimum and maximum or whether it marks outliers separately. The context matters.

How to Construct a Box Plot

To make a box plot from raw data, follow a clear process:

  1. Order the data from smallest to largest.
  2. Find the median $Q_2$.
  3. Find the lower half and upper half of the data.
  4. Find $Q_1$ and $Q_3$.
  5. Calculate the five-number summary.
  6. Draw a number line that covers the data values.
  7. Plot the minimum, $Q_1$, median, $Q_3$, and maximum.
  8. Draw the box and whiskers.

Let’s use an example. Suppose the scores are:

$4,\ 5,\ 7,\ 7,\ 8,\ 9,\ 10,\ 12,\ 13$

These are already ordered. The median is the middle value, so $Q_2 = 8$. The lower half is $4, 5, 7, 7$, so $Q_1 = $\frac{5+7}{2}$ = 6$. The upper half is $9, 10, 12, 13$, so $Q_3 = $\frac{10+12}{2}$ = 11$. The minimum is $4$ and the maximum is $13.

So the five-number summary is:

$\left(4, 6, 8, 11, 13\right)$

The interquartile range is:

$$\text{IQR} = Q_3 - Q_1 = 11 - 6 = 5$$

This means the middle half of the scores spans $5$ points.

Now picture the box plot: the box goes from $6$ to $11$, a line is drawn at $8$, and whiskers extend to $4$ and $13$. Even without reading every score, a viewer can immediately understand the shape of the data.

How to Interpret a Box Plot

A box plot is not just a drawing. It is evidence. When interpreting one, students, focus on these features:

1. Centre

The median shows the typical value. If one group has a higher median than another, its centre is higher.

2. Spread

The range is $\text{max} - \text{min}$, and the interquartile range is $Q_3 - Q_1$. A larger range or IQR means more spread.

3. Skewness

The shape of the box and whiskers can suggest skewness:

  • If the median is near the centre of the box and whiskers are similar, the distribution is roughly symmetric.
  • If the right whisker is longer and the median is closer to $Q_1$, the data may be right-skewed.
  • If the left whisker is longer and the median is closer to $Q_3$, the data may be left-skewed.

For example, if exam scores have a long tail toward the lower end, that could mean a few students scored much lower than the rest. The box plot would show this through a longer left whisker.

4. Outliers

An outlier is a value that is unusually far from the rest of the data. A common rule uses fences based on the interquartile range:

$$\text{Lower fence} = Q_1 - 1.5(\text{IQR})$$

$$\text{Upper fence} = Q_3 + 1.5(\text{IQR})$$

Any value below the lower fence or above the upper fence may be treated as an outlier.

Suppose $Q_1 = 20$ and $Q_3 = 30$. Then $\text{IQR} = 10$. The fences are:

$$20 - 1.5(10) = 5$$

$$30 + 1.5(10) = 45$$

So any value below $5$ or above $45$ would be a potential outlier.

Outliers matter because they can affect the mean a lot, but the median and IQR are more resistant. That is one reason box plots are so useful for real data.

Comparing Box Plots in Real Life

Box plots are especially valuable when comparing two or more groups. This is common in IB questions and in real-world data analysis.

Imagine two schools comparing travel times to school for students:

  • School A has median travel time $15$ minutes and IQR $4$ minutes.
  • School B has median travel time $20$ minutes and IQR $10$ minutes.

What can we say?

  • School B has a higher typical travel time because its median is larger.
  • School B also has more variation because its IQR is larger.
  • If School A’s box plot is more compact, its travel times are more consistent.

A good statistical comparison should mention both centre and spread. Avoid saying only that one set of data is “better” unless the context supports that claim. For example, lower commute time may be desirable, but in other situations a larger value may be better.

Here is another example. Suppose two brands of batteries are tested for lifetime. If one box plot has a higher median and a smaller IQR, that brand may be preferred because it lasts longer on average and is more consistent. In a manufacturing context, consistency is often very important.

Box Plots and the Wider Statistics Topic

Box plots fit into the bigger picture of statistics because they help summarize data and support decision-making. In the IB course, you are expected to move beyond calculations and explain what data mean.

Box plots connect to several key ideas:

  • Data analysis and interpretation: they summarize large data sets clearly.
  • Statistical processes and distributions: they help identify centre, spread, and skewness.
  • Inferential reasoning: they can provide evidence for decisions, though they do not prove causes.
  • Real-world decisions: they help compare options using data rather than guesswork.

For example, a sports coach might compare the sprint times of two training methods. A box plot can show which method leads to faster and more consistent times. A hospital might compare waiting times across departments. A company might compare delivery times before and after a new system is introduced. In each case, the box plot gives a compact visual summary.

One important IB skill is to use correct statistical language. Say:

  • “The median is higher” rather than “the average looks bigger” if you are referring to the median.
  • “The IQR is smaller” rather than “the box is smaller” when making precise comparisons.
  • “There is evidence of skewness” rather than “the graph is weird.”

Clear language shows clear thinking 👍

Common Mistakes to Avoid

Students sometimes make the same mistakes with box plots. Be careful with these:

  • mixing up the median and mean;
  • treating the box plot like a histogram;
  • forgetting that the box contains the middle $50\%$ of the data;
  • comparing groups without mentioning both centre and spread;
  • ignoring possible outliers;
  • using the wrong quartile method for a given data set.

Also, remember that a box plot does not show every detail. It does not tell you the exact shape of the distribution inside the quartiles. Two data sets can have the same box plot but different individual values. So box plots are powerful summaries, but they are not complete pictures.

Conclusion

Box plots are a fast and reliable way to summarize and compare data. They show the five-number summary, the interquartile range, spread, possible outliers, and hints about skewness. In IB Mathematics: Applications and Interpretation HL, they are important because they support evidence-based reasoning in statistics and probability. students, when you interpret a box plot well, you are not just reading a graph—you are explaining what the data mean in a real situation.

Study Notes

  • A box plot is based on the five-number summary: minimum, $Q_1$, median $Q_2$, $Q_3$, and maximum.
  • The interquartile range is $\text{IQR} = Q_3 - Q_1$.
  • The box shows the middle $50\%$ of the data.
  • The line inside the box shows the median.
  • Longer whiskers may suggest skewness.
  • Outliers can be checked using $Q_1 - 1.5(\text{IQR})$ and $Q_3 + 1.5(\text{IQR})$.
  • Box plots are useful for comparing centre, spread, and consistency across groups.
  • A good interpretation mentions evidence, context, and correct statistical vocabulary.
  • Box plots are a key part of data analysis and decision-making in statistics 📦📈

Practice Quiz

5 questions to test your understanding