Lesson 4.5: Standardisation and z-scores

Introduction

Welcome to Lesson 4.5 of Foundation Statistics! 🎓 In this lesson, we will explore the concepts of standardisation and z-scores. By the end of this lesson, you will be able to:

Put a value on a common scale by subtracting the mean and dividing by the standard deviation.
Interpret a z-score as a number of standard deviations from the mean.
Compare values from different distributions using z-scores.
Understand the relationship between standardisation and the normal distribution, which we will study later.

Imagine you are comparing the scores of two test results from different exams. One exam is easier than the other, and scores don't give a fair comparison. This is where standardisation helps! We will learn how to normalize these scores so we can make meaningful comparisons.

What is Standardisation? 🤔

Standardisation is a process used to put data points on a common scale. This is particularly useful when comparing values from different distributions. The formula for standardisation involves two components: the mean ($\mu$) and the standard deviation ($\sigma$). The z-score is a way of expressing how far away a data point is from the mean in terms of standard deviations.

The Formula for Standardisation

The z-score is calculated using the following formula:

$Z = \frac{X - \mu}{\sigma}$

Where:

$Z$ is the z-score
$X$ is the value you want to standardise
$\mu$ is the mean of the dataset
$\sigma$ is the standard deviation of the dataset

Let's break this down:

Subtract the Mean: By taking the mean away from your data point, you shift the scale so that the mean becomes zero.
Divide by the Standard Deviation: This step adjusts the scale to account for variability in the data. This means you are telling us how many standard deviations away from the mean your value is.

Example 1: Calculating a z-score

Let's say we have a dataset of test scores from a math exam:

| Score | 45 | 55 | 60 | 75 | 80 |

|-------|----|----|----|----|----|

Calculate the Mean ($\mu$):

$$\mu = \frac{45 + 55 + 60 + 75 + 80}{5} = 63$$

Calculate the Standard Deviation ($\sigma$):

$$\sigma = \sqrt{\frac{(45-63)^2 + (55-63)^2 + (60-63)^2 + (75-63)^2 + (80-63)^2}{5}} = \sqrt{\frac{(324 + 64 + 9 + 144 + 289)}{5}} = \sqrt{\frac{830}{5}} = \sqrt{166} \approx 12.85$$

Standardise a Score ($X = 75$):

$$Z = \frac{75 - 63}{12.85} \approx 0.93$$

This means that a score of 75 is approximately 0.93 standard deviations above the mean.

Interpreting z-scores 📊

Once you have calculated the z-score, the next step is to understand what it means. The z-score indicates how many standard deviations a data point is from the mean:

A z-score of 0 means the data point is exactly at the mean.
A positive z-score indicates the data point is above the mean.
A negative z-score indicates the data point is below the mean.

Example 2: Interpreting z-scores

Continuing with our previous example, if another student scored 45:

Calculate the z-score:

$$Z = \frac{45 - 63}{12.85} \approx -1.40$$

This means that the score of 45 is about 1.40 standard deviations below the mean score of the test.

Comparing Scores from Different Distributions

One of the greatest benefits of z-scores is the ability to compare scores from different distributions. Let's say you have two students:

Student A scores 80 in Math (with mean 70 and $\sigma = 10$).
Student B scores 50 in English (with mean 60 and $\sigma = 5$).

Calculating their z-scores:

For Student A:

$$Z_{Math} = \frac{80 - 70}{10} = 1$$

This means Student A's score is 1 standard deviation above the mean.

For Student B:

$$Z_{English} = \frac{50 - 60}{5} = -2$$

This shows that Student B’s score is 2 standard deviations below the mean.

Though Student A scored higher in Math (80), Student B is at a greater disadvantage in English based on their z-scores.

Conclusion

In this lesson, we have learned how to standardise data points using z-scores. We explored the formula for calculating z-scores and interpreted what they mean in terms of position relative to the mean. Standardisation opens doors to comparing disparate datasets effectively.

Study Notes

Standardisation puts data points on a common scale.
The formula for z-score is $Z = \frac{X - \mu}{\sigma}$.
A z-score indicates how many standard deviations a data point is from the mean.
Positive z-scores are above the mean; negative z-scores are below.
You can compare z-scores across different distributions to understand relative standings.
Standardisation relates to the normal distribution, which we cover later.