Measures of Dispersion 📊
In this lesson, students, you will learn how to describe how spread out data are. When two classes have the same average test score, they may still look very different if one class has scores clustered closely together and the other has scores scattered widely. That spread is called dispersion. Understanding dispersion helps you judge reliability, compare data sets fairly, and make smarter decisions in real life, such as comparing prices, sports performances, or exam results.
Lesson objectives
By the end of this lesson, students, you should be able to:
- explain the main ideas and vocabulary of measures of dispersion,
- calculate and interpret common measures of spread,
- connect measures of dispersion to statistical analysis and real-world decision-making,
- recognize how dispersion supports conclusions in IB Mathematics: Applications and Interpretation SL,
- use examples to compare data sets beyond just the average.
Why spread matters in statistics
A single number like the mean gives only part of the story. Suppose two students both average $80$ on five quizzes. Student A scores $79, 80, 80, 81, 80$, while Student B scores $40, 70, 80, 95, 115. The means are the same, but the second set is much more spread out. If you were a teacher, you would want to know whether the class is performing consistently or whether results are unstable. 📈
That is what measures of dispersion tell us. They describe the variability of a data set. In IB statistics, you often combine measures of center, like the mean or median, with measures of spread, like the range or interquartile range, to build a full picture of the data.
The main measures you need to know are:
- range,
- interquartile range,
- variance,
- standard deviation.
Sometimes you also use the five-number summary, which includes the minimum, lower quartile, median, upper quartile, and maximum. These values help you see spread and identify unusual values.
Range and interquartile range
The range is the simplest measure of spread. It is the difference between the largest and smallest values in a data set.
If the maximum value is $x_{\max}$ and the minimum value is $x_{\min}$, then
$$\text{range} = x_{\max} - x_{\min}.$$
The range is easy to calculate, but it only uses two values, so one extreme value can greatly change it. For example, the scores $72, 73, 74, 75, 100$ have range $100 - 72 = 28, even though most scores are close together.
A more useful measure is the interquartile range or IQR. This measures the spread of the middle $50\%$ of the data. It is found by subtracting the lower quartile from the upper quartile:
$$\text{IQR} = Q_3 - Q_1.$$
Here, $Q_1$ is the first quartile and $Q_3$ is the third quartile. The IQR is less affected by outliers, so it is often better than the range when data contain extreme values. For example, if test scores include one very low score due to illness, the IQR still gives a fair description of the typical spread of the class.
Example
Consider the data set $4, 6, 7, 8, 12, 13, 20, 21.
- The minimum is $4$ and the maximum is $21$, so the range is $21 - 4 = 17$.
- The lower half is $4, 6, 7, 8$, so $Q_1 = $\frac{6 + 7}{2}$ = 6.5.
- The upper half is $12, 13, 20, 21$, so $Q_3 = $\frac{13 + 20}{2}$ = 16.5.
- Therefore, $\text{IQR} = 16.5 - 6.5 = 10$.
This means the middle half of the data lies across a span of $10$ units. That is a strong summary of the typical spread. ✅
Variance and standard deviation
The range and IQR are useful, but they do not use every value in the data. Variance and standard deviation do. They measure spread by looking at how far each value is from the mean.
If the data values are $x_1, x_2, \dots, x_n$ and the mean is $\bar{x}$, then the deviations are $x_i - \bar{x}$. These deviations add to zero, so we square them before averaging.
For a population, the variance is
$$\sigma^2 = \frac{1}{n}\sum_{i=1}^{n}(x_i - \mu)^2,$$
where $\mu$ is the population mean. The standard deviation is
$$\sigma = \sqrt{\sigma^2}.$$
For a sample, the variance is often written as
$$s^2 = \frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2,$$
and the standard deviation is
$$s = \sqrt{s^2}.$$
In IB Mathematics: Applications and Interpretation SL, you should pay attention to whether a question is treating the data as a population or as a sample. The difference matters because the denominator changes from $n$ to $n-1$.
Why square the deviations?
If a value is far from the mean, its squared deviation becomes larger. That makes standard deviation sensitive to unusual values. This is useful because big differences should matter more. However, it also means that outliers can increase standard deviation a lot.
Example
Suppose the data are $2, 4, 4, 4, 6.
The mean is
$$\bar{x} = \frac{2 + 4 + 4 + 4 + 6}{5} = 4.$$
The deviations are $-2, 0, 0, 0, 2$, and the squared deviations are $4, 0, 0, 0, 4.
The population variance is
$$\sigma^2 = \frac{4 + 0 + 0 + 0 + 4}{5} = \frac{8}{5} = 1.6,$$
so the standard deviation is
$$\sigma = \sqrt{1.6} \approx 1.26.$$
This tells us that the values are usually about $1.26$ units away from the mean. A small standard deviation means the data are tightly grouped; a large standard deviation means they are more spread out.
Comparing data sets fairly
Measures of dispersion are especially important when comparing two or more data sets. Two distributions may have the same mean but very different spreads. When that happens, the more spread-out set is less consistent.
Imagine two delivery services. Both have an average delivery time of $30$ minutes.
- Service A usually delivers between $28$ and $32$ minutes.
- Service B sometimes delivers in $10$ minutes and sometimes in $50$ minutes.
Even though the average is the same, Service A is more reliable. A smaller standard deviation or IQR shows more consistency.
Dispersion also helps when deciding whether a mean is a good summary. If the data are strongly skewed or include outliers, the median and IQR are often more appropriate than the mean and standard deviation. For example, house prices usually have some very expensive properties, so the median and IQR may describe the typical market better than the mean.
Using technology and interpreting results
In IB Mathematics: Applications and Interpretation SL, you may use a calculator or spreadsheet to find measures of dispersion quickly. This saves time and reduces arithmetic errors. But it is still important to understand what the numbers mean.
When you read output from technology, check:
- the units,
- whether the statistic is for a sample or population,
- whether the data are symmetric or skewed,
- whether an outlier may be affecting the result.
For example, if a calculator gives a standard deviation of $3.5$ seconds for reaction times, that means the data typically differ from the mean by about $3.5$ seconds. If one very large value appears, the standard deviation may increase more than the IQR. This is why choosing the right measure is part of statistical reasoning.
Measures of dispersion in real-world decisions
Dispersion supports informed choices in many areas:
- In sports, a team with consistent scores may be more dependable than a team with the same average but higher variation.
- In science experiments, repeated measurements with low spread suggest better precision.
- In finance, assets with higher spread in returns may be riskier.
- In education, a teacher may use spread to see whether a class needs extra support or extension.
These examples show that statistics is not only about finding averages. It is about interpreting data responsibly. When students compares data sets, the correct conclusion often depends on both center and spread. 📚
Conclusion
Measures of dispersion tell us how much data vary. The range gives a quick overview, the IQR shows the spread of the middle half, and variance and standard deviation use all the data to describe typical distance from the mean. In IB Mathematics: Applications and Interpretation SL, these measures help you compare data sets, judge consistency, identify unusual values, and make realistic conclusions.
Always remember that a mean without a measure of spread can be misleading. A complete statistical analysis uses both. When you understand dispersion, you are better equipped to interpret data in school, in exams, and in everyday life.
Study Notes
- Dispersion means how spread out the data are.
- The range is $x_{\max} - x_{\min}$.
- The interquartile range is $Q_3 - Q_1$ and describes the middle $50\%$ of the data.
- The variance measures average squared distance from the mean.
- The standard deviation is the square root of variance.
- Smaller spread usually means more consistency.
- Outliers can affect the range and standard deviation a lot.
- The IQR is more resistant to outliers than the range.
- Always check whether data represent a sample or a population.
- In statistics, compare both center and spread before making conclusions.
