Measures of Dispersion π
Introduction
When students looks at a set of data, the first question is often, βWhat is typical?β But that is only part of the story. Two classes can have the same average score and still be very different. One class might have scores tightly clustered together, while another might have scores spread far apart. That spread is called dispersion.
In this lesson, students will learn how to describe how spread out data is, why that matters, and how measures of dispersion help in statistics and probability. The main ideas include the range, interquartile range, variance, and standard deviation. These tools help answer questions like:
- Are the data values close to the center or far away?
- Is one group more consistent than another?
- How unusual is a particular value? π€
By the end of this lesson, students should be able to explain the terminology, calculate common measures of spread, and connect them to real data situations in IB Mathematics Analysis and Approaches SL.
Why Spread Matters
Imagine two basketball players each averaging $20$ points per game. At first, they seem equally strong. But if Player A scores between $18$ and $22$ most games, and Player B scores between $5$ and $35$, the two players are not equally consistent. Player A has less dispersion, so their performance is more predictable.
This same idea appears everywhere:
- In exam results, a teacher may want to know whether most students scored near the same mark.
- In weather data, a city with stable temperatures has lower dispersion than a city with dramatic changes.
- In manufacturing, small dispersion means products are more uniform.
Dispersion gives meaning to the average. Without it, a mean can hide important differences.
The Range and the Interquartile Range
The simplest measure of spread is the range. It is found by subtracting the smallest value from the largest value:
$$\text{range} = \max(x) - \min(x)$$
The range is easy to calculate, but it can be misleading because it depends only on the two extreme values. If one value is unusual, the range changes a lot.
Example of range
Suppose the data are $4, 6, 7, 7, 8, 10, 12.
- Largest value is $12$
- Smallest value is $4$
So,
$$\text{range} = 12 - 4 = 8$$
A more reliable measure is the interquartile range or IQR. This measures the spread of the middle $50\%$ of the data:
$$\text{IQR} = Q_3 - Q_1$$
Here, $Q_1$ is the first quartile and $Q_3$ is the third quartile. Quartiles divide ordered data into four equal parts.
The IQR is useful because it is less affected by extreme values than the range. That makes it a strong choice when data include outliers.
Example of IQR
For the ordered data $4, 6, 7, 7, 8, 10, 12:
- The median is $7$
- The lower half is $4, 6, 7
- The upper half is $8, 10, 12
So,
$$Q_1 = 6, \quad Q_3 = 10$$
Therefore,
$$\text{IQR} = 10 - 6 = 4$$
This means the middle half of the data lies within a spread of $4$ units.
Variance and Standard Deviation
The range and IQR describe spread using only a few values. For a deeper understanding of dispersion, we use variance and standard deviation. These measures consider how far each data value is from the mean.
First, compute the mean:
$$\bar{x} = \frac{\sum x}{n}$$
Then find each deviation from the mean:
$$x - \bar{x}$$
Because deviations above the mean are positive and those below are negative, the sum of deviations is always $0$. So we square them first.
The variance is the average of the squared deviations.
For a population:
$$\sigma^2 = \frac{\sum (x - \mu)^2}{N}$$
For a sample:
$$s^2 = \frac{\sum (x - \bar{x})^2}{n - 1}$$
The standard deviation is the square root of the variance:
$$\sigma = \sqrt{\sigma^2}$$
or
$$s = \sqrt{s^2}$$
Standard deviation is easier to interpret because it is in the same units as the original data.
Example of standard deviation
Consider the data set $2, 4, 4, 4, 5, 5, 7, 9.
First find the mean:
$$\bar{x} = \frac{2+4+4+4+5+5+7+9}{8} = \frac{40}{8} = 5$$
Now compute squared deviations:
$$ (2-5)^2 = 9, \quad (4-5)^2 = 1, \quad (4-5)^2 = 1, \quad (4-5)^2 = 1 $$
$$ (5-5)^2 = 0, \quad (5-5)^2 = 0, \quad (7-5)^2 = 4, \quad (9-5)^2 = 16 $$
Sum them:
$$\sum (x-\bar{x})^2 = 32$$
If this is treated as a population:
$$\sigma^2 = \frac{32}{8} = 4$$
So,
$$\sigma = \sqrt{4} = 2$$
This tells students that values are typically about $2$ units away from the mean.
Comparing Data Sets
Measures of dispersion are especially useful when comparing two or more data sets. Suppose two stores sell the same product, and each records daily sales. If both stores have the same mean sales, the store with the smaller standard deviation has more consistent sales.
Here is a simple comparison:
- Data Set A: $10, 10, 10, 10, 10
- Data Set B: $4, 8, 10, 12, 16
Both have the same mean:
$$\bar{x} = 10$$
But Data Set A has zero dispersion, while Data Set B has much larger spread. This means Data Set A is perfectly consistent, while Data Set B varies a lot.
When comparing data, students should ask:
- Are the means the same?
- Which set has a smaller range or IQR?
- Which set has a smaller standard deviation?
In many IB questions, a smaller standard deviation means values are more tightly clustered around the mean.
Choosing the Right Measure
Different measures of dispersion are useful in different situations.
- Range: quick and simple, but sensitive to extremes.
- IQR: strong for skewed data or data with outliers.
- Variance: useful in theory and in more advanced statistics.
- Standard deviation: most common for describing spread around the mean.
If the data are symmetric and without strong outliers, standard deviation is often a good summary. If the data are skewed or contain outliers, the median and IQR may describe the distribution better.
For example, if a class has mostly test scores around $70$ but one student scored $0$ due to absence, the range and standard deviation may increase sharply. The IQR will usually be less affected.
Connecting Dispersion to Probability
Dispersion is not only about describing data. It also matters in probability. In probability distributions, spread tells how likely outcomes are to vary around the average.
For a discrete random variable $X$, the variance is
$$\mathrm{Var}(X) = E\left((X-\mu)^2\right)$$
and the standard deviation is
$$\sigma = \sqrt{\mathrm{Var}(X)}$$
This helps students measure risk or uncertainty. For example, two games may have the same expected winnings, but one may be much riskier because its outcomes are more spread out.
In a normal distribution, the standard deviation is especially important. Many IB problems ask about values within one standard deviation, two standard deviations, or about comparing distributions with different spreads. A larger standard deviation means a wider, flatter curve, while a smaller standard deviation means a narrower, taller curve.
Conclusion
Measures of dispersion help students understand how spread out a data set is, not just where its center lies. The range and IQR give quick summaries of spread, while variance and standard deviation provide deeper measures based on distances from the mean. These ideas are important in statistics because they help compare data sets, identify consistency, and interpret variability. They are also important in probability, where spread describes uncertainty and risk.
For IB Mathematics Analysis and Approaches SL, students should remember that a complete statistical description usually includes both a measure of center and a measure of dispersion. Together, they give a much more accurate picture of the data. π
Study Notes
- Dispersion means how spread out data values are.
- The range is $\max(x) - \min(x)$.
- The interquartile range is $Q_3 - Q_1$ and describes the spread of the middle $50\%$.
- The variance uses squared distances from the mean.
- The standard deviation is the square root of the variance.
- For a population, $\sigma^2 = \frac{\sum (x-\mu)^2}{N}$.
- For a sample, $s^2 = \frac{\sum (x-\bar{x})^2}{n-1}$.
- Smaller standard deviation means data are more consistent and less spread out.
- Range is sensitive to outliers; IQR is more resistant.
- Dispersion helps compare data sets and interpret probability distributions.
- In IB statistics, always consider both center and spread for a full description.
