Statistical Measures 📊

Introduction: Why do statistical measures matter?

Imagine students is looking at the scores from a class test, the times in a 100 m race, or the heights of basketball players. A long list of data can be hard to understand just by looking at the raw numbers. Statistical measures help turn that list into useful information. They give a quick picture of what the data looks like, how spread out it is, and whether there are unusual values.

In IB Mathematics: Analysis and Approaches HL, statistical measures are a foundation for the whole topic of Statistics and Probability. They are used to describe data, compare groups, and prepare for deeper ideas such as correlation, regression, conditional probability, and distributions. 🎯

Learning objectives

By the end of this lesson, students should be able to:

explain the main ideas and terminology behind statistical measures,
apply IB-style reasoning to measures of center and spread,
connect statistical measures to the wider topic of Statistics and Probability,
summarize how statistical measures help interpret real data,
use examples to show how these ideas work in practice.

Measures of central tendency: finding a “typical” value

The first important question about a data set is often: what is a typical value? Statistical measures of central tendency answer this.

Mean

The mean is the arithmetic average. It is found by adding all values and dividing by the number of values.

If the data values are $x_1, x_2, \dots, x_n$, then the mean is

$$\bar{x} = \frac{x_1 + x_2 + \cdots + x_n}{n}$$

The mean uses every value, so it is sensitive to extreme values. For example, if five students have test scores $60$, $65$, $70$, $75$, and $100$, the mean is

$$\bar{x} = \frac{60+65+70+75+100}{5} = 74$$

If one student had a very low or very high score, the mean would change noticeably. This makes the mean useful, but also vulnerable to outliers.

Median

The median is the middle value when the data is arranged in order. If there are two middle values, the median is the average of those two values.

For the scores $60, 65, 70, 75, 100$, the median is $70.

The median is especially useful when the data is skewed or contains outliers. For example, if house prices in one neighborhood mostly cluster around $300{,}000$ but one mansion costs $5{,}000{,}000$, the mean may be pulled upward. The median gives a better sense of a “typical” house price.

Mode

The mode is the value that appears most often. A data set may have one mode, more than one mode, or no mode.

For the data $2, 3, 3, 5, 7, 7, 7, 8$, the mode is $7.

The mode is especially useful for categorical data, such as the most common shoe size, favorite sport, or transport method to school. 🚲

Measures of spread: how varied is the data?

Two data sets can have the same mean but very different amounts of variation. Statistical measures of spread tell us how dispersed the data is.

Range

The range is the difference between the largest and smallest values.

If the data set is $60, 65, 70, 75, 100, then the range is

$$100 - 60 = 40$$

The range is easy to calculate, but it uses only two values, so it can be affected a lot by one unusual value.

Interquartile range

The interquartile range, written as $\mathrm{IQR}$, measures the spread of the middle 50% of the data.

It is calculated using the quartiles:

$$\mathrm{IQR} = Q_3 - Q_1$$

where $Q_1$ is the lower quartile and $Q_3$ is the upper quartile.

The IQR is useful because it is resistant to outliers. If the top score in a test is extremely high, the IQR will not change as dramatically as the range.

Variance and standard deviation

Variance and standard deviation are more advanced measures of spread. They show how far the data values typically are from the mean.

For a population, the variance is

$$\sigma^2 = \frac{\sum (x-\mu)^2}{N}$$

and the standard deviation is

$$\sigma = \sqrt{\sigma^2}$$

For a sample, the sample variance is

$$s^2 = \frac{\sum (x-\bar{x})^2}{n-1}$$

and the sample standard deviation is

$$s = \sqrt{s^2}$$

The symbol $\mu$ represents the population mean, while $\bar{x}$ represents the sample mean. The denominator $n-1$ in the sample formula is used to give an unbiased estimate of the population variance from a sample.

Standard deviation is often preferred because it is in the same unit as the original data. If the data is measured in seconds, the standard deviation is also in seconds. This makes interpretation easier.

For example, if two classes both have a mean test score of $70$, but Class A has a small standard deviation while Class B has a large standard deviation, Class A has more consistent scores. Class B has scores spread out more widely.

Using statistical measures with graphs and outliers

Statistical measures become more meaningful when combined with graphical displays such as histograms, box plots, or scatter plots. A box plot shows the median, quartiles, and extremes clearly. A histogram can show whether data is symmetric, skewed, or has gaps.

Outliers

An outlier is a value that is unusually far from the rest of the data. Outliers can happen for many reasons: measurement error, a rare event, or a genuinely extreme observation.

For example, if the data set of weekly study hours is $4, 5, 5, 6, 6, 7, 25$, then $25 is much larger than the others. The mean would increase a lot, but the median would stay close to the middle of the main group.

This is why students must choose measures carefully. If the data is symmetric and has no strong outliers, the mean and standard deviation are very useful. If the data is skewed, the median and IQR are often better choices.

Comparing data sets

Statistical measures help compare groups fairly. Suppose one set of exam scores has a mean of $68$ and another has a mean of $72$. At first glance, the second group looks stronger. But if the first group has a much smaller standard deviation, it may be more consistent.

This kind of comparison is important in real life. Coaches compare training times, businesses compare delivery speeds, and schools compare assessment results. 📈

How statistical measures connect to IB Statistics and Probability

Statistical measures are not isolated formulas. They are part of the larger process of working with data.

Data collection and description

Before measures can be calculated, data must be collected correctly. Samples should be representative, and the method of collection should avoid bias as much as possible. If the sample is not fair, even accurate calculations may lead to misleading conclusions.

Regression and correlation

In scatter plots, statistical ideas such as mean and standard deviation help describe the data before a line of best fit is studied. Correlation tells us how strongly two variables are related, but the measures of center and spread still help explain the values of each variable separately.

For example, if students is studying height and arm span, the mean and standard deviation of each variable help describe the group before any regression model is built.

Probability distributions

Statistical measures also appear in probability. A discrete or continuous random variable has a mean, often called the expected value, and a measure of spread such as variance. So the ideas from descriptive statistics lead naturally into probability distributions.

For a random variable $X$, the expected value is written as $E(X)$. This is the probability-based version of the mean. In a discrete distribution, the variance measures how much values vary around $E(X)$.

This shows an important link: descriptive statistics summarizes data that has already been collected, while probability uses similar ideas to describe random processes. 🔗

Worked example: interpreting a data set

Consider the data set

$$3, 4, 4, 6, 7, 9, 21$$

The mean is

$$\bar{x} = \frac{3+4+4+6+7+9+21}{7} = \frac{54}{7} \approx 7.71$$

The median is $6$.

The mode is $4$.

The range is

$$21 - 3 = 18$$

What does this tell us? The value $21$ is much larger than the rest and is likely an outlier. Because of that, the mean is pulled upward and is larger than most of the actual data values. The median gives a better picture of the middle of the main group. The mode shows the most common value, which is $4$. The range shows a large spread, but it is strongly affected by the outlier.

This example shows why no single measure is always best. students should look at the shape of the data, the presence of outliers, and the context before deciding which measure to report.

Conclusion

Statistical measures help turn raw data into useful information. Measures of central tendency such as the mean, median, and mode describe a typical value. Measures of spread such as the range, IQR, variance, and standard deviation describe how spread out the data is. Together, these measures help students interpret data, compare groups, and spot unusual features.

In IB Mathematics: Analysis and Approaches HL, statistical measures connect directly to data collection, regression, probability distributions, and deeper statistical reasoning. They are one of the main tools for understanding the story that data tells. ✅

Study Notes

The mean is the average: $\bar{x} = \frac{x_1+x_2+\cdots+x_n}{n}$.
The median is the middle value after ordering the data.
The mode is the most frequent value.
The range is $\text{largest} - \text{smallest}$.
The interquartile range is $\mathrm{IQR} = Q_3 - Q_1$.
Population variance is $\sigma^2 = \frac{\sum (x-\mu)^2}{N}$.
Sample variance is $s^2 = \frac{\sum (x-\bar{x})^2}{n-1}$.
Standard deviation is the square root of variance.
The mean and standard deviation are sensitive to outliers.
The median and IQR are resistant to outliers.
Use the mean and standard deviation for roughly symmetric data.
Use the median and IQR for skewed data or data with outliers.
Statistical measures help with comparisons, graph interpretation, and probability ideas.
In probability, the mean becomes expected value, linking descriptive statistics to distributions.