4. Statistics and Probability

Measures Of Dispersion

Measures of Dispersion

students, when two classes have the same average score, do they always perform the same way? ๐Ÿค” Not necessarily. One class might have scores packed tightly together, while another has results spread out widely. That โ€œspreadโ€ is what measures of dispersion describe. In statistics, knowing the center of data is important, but knowing how spread out the data are is just as important. A school, a sports team, or a business can make very different decisions depending on whether values are consistent or highly variable.

In this lesson, you will learn how to describe spread using measures such as the range, interquartile range, variance, and standard deviation. You will also see how these measures connect to data interpretation, real-world decisions, and the broader study of statistics and probability. By the end, you should be able to explain what dispersion tells us, calculate it for data sets, and interpret it in context.

Why spread matters in data ๐Ÿ“Š

A measure of central tendency, such as the mean or median, gives a single โ€œtypicalโ€ value. However, that one number can hide important differences. For example, imagine two students each have an average test score of $75$.

  • Student A scored $74$, $75$, $76$, and $75$.
  • Student B scored $40$, $70$, $90$, and $100$.

Both have the same mean, but the second set is much more spread out. If you are a teacher, you might worry about Student Bโ€™s inconsistency. If you are a coach, you may care whether a player performs steadily or unpredictably. Dispersion helps answer those questions.

The main idea is simple: the more spread out the data, the greater the dispersion. Lower dispersion means the values are more clustered around the center.

Common reasons to measure spread include:

  • comparing two groups with the same mean,
  • judging consistency or reliability,
  • identifying unusual data values,
  • helping interpret graphs and distributions,
  • supporting decisions in finance, science, and everyday life.

The range and interquartile range

The simplest measure of dispersion is the range. It is the difference between the largest and smallest values:

$$\text{Range} = \text{maximum} - \text{minimum}$$

For example, if the heights in a group are $150\text{ cm}$, $156\text{ cm}$, $160\text{ cm}$, $162\text{ cm}$, and $170\text{ cm}$, then

$$\text{Range} = 170 - 150 = 20\text{ cm}$$

The range is easy to calculate, but it only uses two values. That makes it sensitive to extreme values, also called outliers. If one person in the group were $200\text{ cm}$ tall, the range would increase a lot, even if most of the data stayed similar.

A more reliable measure is the interquartile range or IQR. The IQR measures the spread of the middle $50\%$ of the data. It is found by subtracting the first quartile $Q_1$ from the third quartile $Q_3$:

$$\text{IQR} = Q_3 - Q_1$$

To understand quartiles, first arrange the data in order. The median splits the data into two halves. Then:

  • $Q_1$ is the median of the lower half,
  • $Q_3$ is the median of the upper half.

Suppose the ordered data are $4, 5, 7, 8, 9, 10, 12, 13$. The lower half is $4, 5, 7, 8, so

$$Q_1 = \frac{5+7}{2} = 6$$

The upper half is $9, 10, 12, 13, so

$$Q_3 = \frac{10+12}{2} = 11$$

Therefore,

$$\text{IQR} = 11 - 6 = 5$$

The IQR is especially useful when data are skewed or when there are outliers because it focuses on the central part of the distribution.

Variance and standard deviation

For more detailed analysis, especially in IB Mathematics: Applications and Interpretation HL, you need the variance and standard deviation. These measures use every data value, not just the extremes or quartiles.

First, calculate how far each value is from the mean $\bar{x}$. These are the deviations $x_i - \bar{x}$. Since positive and negative deviations cancel out, we square them before averaging. For a population, the variance is

$$\sigma^2 = \frac{\sum (x_i - \mu)^2}{n}$$

where $\mu$ is the population mean and $n$ is the number of values.

For a sample, the variance is

$$s^2 = \frac{\sum (x_i - \bar{x})^2}{n-1}$$

The $n-1$ in the denominator is used for a sample to make the estimate less biased. This is an important idea in inferential statistics because samples are often used to estimate population properties.

The standard deviation is the square root of the variance:

$$\sigma = \sqrt{\sigma^2}$$

for a population, and

$$s = \sqrt{s^2}$$

for a sample.

Standard deviation is very useful because it is in the same units as the original data. If scores are measured in points, then the standard deviation is also in points. That makes interpretation easier than variance, which has squared units.

Worked example

Consider the data set $2, 4, 6, 8.

First find the mean:

$$\bar{x} = \frac{2+4+6+8}{4} = 5$$

Now find the deviations and square them:

  • $(2-5)^2 = 9$
  • $(4-5)^2 = 1$
  • $(6-5)^2 = 1$
  • $(8-5)^2 = 9$

Add them:

$$\sum (x_i-\bar{x})^2 = 20$$

For the sample variance:

$$s^2 = \frac{20}{4-1} = \frac{20}{3}$$

So the sample standard deviation is

$$s = \sqrt{\frac{20}{3}} \approx 2.58$$

This tells us that the values are, on average, about $2.58$ units from the mean in a typical sense.

Interpreting dispersion in context ๐ŸŒ

Statistics is not just about calculation; it is about meaning. A measure of dispersion should always be interpreted in context.

Suppose two delivery services have the same average delivery time of $30$ minutes. Service A has a standard deviation of $2$ minutes, while Service B has a standard deviation of $10$ minutes. Service A is much more consistent. For customers, that reliability may matter more than the average alone.

Or imagine two classes both have a median exam score of $80$.

  • Class X has a small IQR, so most students scored near $80$.
  • Class Y has a large IQR, so scores are more varied.

Even if the central value is the same, the learning experience may be very different.

When interpreting dispersion, ask:

  • Are the data close together or spread out?
  • Is there an outlier affecting the result?
  • Is the measure appropriate for the shape of the data?
  • What does the spread mean in the real situation?

This type of reasoning is a key part of IB statistics, especially when making decisions from data rather than simply calculating values.

Comparing different measures

Each measure of dispersion has strengths and weaknesses.

Range

  • Quick and easy to calculate.
  • Uses only the smallest and largest values.
  • Strongly affected by outliers.

Interquartile range

  • Describes the middle $50\%$ of the data.
  • Resistant to outliers.
  • Useful for skewed distributions.

Variance and standard deviation

  • Use all data values.
  • Good for deeper analysis and comparison.
  • Standard deviation is especially common in science, economics, and probability.

A good statistician chooses the measure that best matches the data and the question being asked. For example, if you are studying house prices, a few very expensive homes may create outliers, so the IQR may describe typical spread better than the range. If you are analyzing test scores in a normally distributed class, standard deviation may be the most informative measure.

Connection to probability and distributions

Measures of dispersion are closely linked to probability models and distributions. In a probability distribution, spread describes how likely outcomes are clustered or dispersed around the mean.

For a random variable $X$, the variance is often written as

$$\mathrm{Var}(X)$$

and the standard deviation as

$$\sigma_X = \sqrt{\mathrm{Var}(X)}$$

A distribution with a small standard deviation has values concentrated near the mean. A distribution with a large standard deviation has more spread and greater unpredictability.

This matters in real-world decisions. In finance, two investments might have the same expected return, but the one with the smaller standard deviation is usually less risky. In quality control, a manufacturing process with a small spread produces more consistent products. In weather forecasting, a wider spread in predicted temperatures means less certainty.

Dispersion therefore helps connect data analysis with uncertainty and probability. It gives information not only about what is typical, but also about how much variation to expect.

Conclusion

Measures of dispersion help us understand how data are spread out, not just where they are centered. The range gives a quick overview, the interquartile range shows the spread of the middle values, and the variance and standard deviation provide deeper information using all data points. In IB Mathematics: Applications and Interpretation HL, these measures are essential for describing data, comparing distributions, and making informed decisions. students, when you combine measures of center with measures of spread, you get a much clearer picture of the story the data are telling. That is the heart of statistical reasoning ๐Ÿ’ก

Study Notes

  • Dispersion means the spread or variability of data.
  • The range is $\text{maximum} - \text{minimum}$.
  • The interquartile range is $\text{IQR} = Q_3 - Q_1$.
  • The IQR describes the middle $50\%$ of the data.
  • Variance measures the average squared distance from the mean.
  • Standard deviation is the square root of the variance.
  • For a population, variance is $\sigma^2 = \frac{\sum (x_i - \mu)^2}{n}$.
  • For a sample, variance is $s^2 = \frac{\sum (x_i - \bar{x})^2}{n-1}$.
  • Standard deviation is in the same units as the original data.
  • Use range for a quick estimate, IQR for skewed data, and standard deviation for deeper comparison.
  • Measures of dispersion are essential for interpreting data, comparing groups, and making decisions in real situations.

Practice Quiz

5 questions to test your understanding