Lesson 4.3: Spearman's Rank Correlation Coefficient
Introduction
In this lesson, we will explore Spearman's rank correlation coefficient, a statistical measure that evaluates the strength and direction of the association between two ranked variables. By the end of this lesson, students should be able to understand how to compute and interpret Spearman's coefficient and determine when to choose this measure over Pearson's product-moment correlation coefficient (PMCC).
Learning Objectives
- Understand Spearman's rank correlation coefficient as a measure of association based on ranks, including consistent ranking of both variables and appropriate handling of tied values.
- Learn how to accurately obtain Spearman's coefficient using a calculator.
- Make informed choices between Spearman's rank correlation and Pearson's PMCC according to the data conditions.
- Develop skills to rank both variables consistently, deal with tied ranks appropriately, and compute Spearman's coefficient using a calculator.
- Interpret the value of Spearman's rank correlation coefficient in context.
Section 1: Understanding Spearman's Rank Correlation Coefficient
Spearman's rank correlation coefficient, denoted as $ r_s $, is a non-parametric measure of correlation that assesses how well the relationship between two variables can be described using a monotonic function. Unlike Pearson's correlation which measures linear relationships, Spearman's coefficient is particularly useful for data that do not meet the assumptions necessary to use Pearson's method.
The Basics of Ranking
To compute $ r_s $, we first need to convert our raw data into ranks:
- For each variable, assign rank values to the observations. The smallest value gets a rank of 1, the next smallest a rank of 2, and so forth.
- In case of tied values, assign to each tied observation the average of the ranks they would have received had they not been tied.
Example 1: Ranking
Consider the following set of data representing students' scores in Subject A and Subject B:
| Student | Score A | Score B |
|---|---|---|
| 1 | 85 | 78 |
| 2 | 92 | 88 |
| 3 | 85 | 82 |
| 4 | 78 | 75 |
| 5 | 91 | 90 |
Step 1: Assign ranks for Score A:
- Observations: 85, 92, 85, 78, 91
- Ranks: 3, 5, 3, 1, 4 (Notice how 85 is tied, so the average rank $ \frac{3+3}{2} = 3 $ is assigned).
| Student | Score A | Rank A |
|---|---|---|
| 1 | 85 | 3 |
| 2 | 92 | 5 |
| 3 | 85 | 3 |
| 4 | 78 | 1 |
| 5 | 91 | 4 |
Step 2: Assign ranks for Score B:
- Observations: 78, 88, 82, 75, 90
- Ranks: 1, 4, 3, 2, 5
| Student | Score B | Rank B |
|---|---|---|
| 1 | 78 | 1 |
| 2 | 88 | 4 |
| 3 | 82 | 3 |
| 4 | 75 | 2 |
| 5 | 90 | 5 |
Calculating Spearman's Rank Correlation Coefficient
Once the ranks are assigned, we can calculate $ r_s $ using the formula:
$$r_s = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)}$$
where $d_i$ is the difference between the ranks of each pair of observations, and $n$ is the number of observations.
Step 3: Calculate the differences $ d_i $ and their squares:
| Student | Rank A | Rank B | $d_i = \text{Rank A} - \text{Rank B}$ | $d_i^2$ |
|---|---|---|---|---|
| 1 | 3 | 1 | 2 | 4 |
| 2 | 5 | 4 | 1 | 1 |
| 3 | 3 | 3 | 0 | 0 |
| 4 | 1 | 2 | -1 | 1 |
| 5 | 4 | 5 | -1 | 1 |
Calculating the sum of $d_i^2$:
$$\sum d_i^2 = 4 + 1 + 0 + 1 + 1 = 7$$
Step 4: Plug values into the formula:
- $n = 5$
$$r_s = 1 - \frac{6 \cdot 7}{5(5^2 - 1)}\ = 1 - \frac{42}{5(24)}\ = 1 - \frac{42}{120}\ = 1 - 0.35\ = 0.65\$$
Thus, $ r_s = 0.65 $.
Interpreting Spearman's Coefficient
The value of Spearman's rank correlation coefficient $ r_s $ can range from -1 to +1:
- $ r_s = 1 $ indicates a perfect positive correlation.
- $ r_s = -1 $ indicates a perfect negative correlation.
- $ r_s = 0 $ implies no correlation.
A coefficient of $ r_s = 0.65 $ suggests a moderate positive correlation, indicating that as scores in Subject A increase, scores in Subject B also tend to increase.
Section 2: Choosing Between Spearman's Rank and Pearson's PMCC
While both Spearman's rank correlation and Pearson's PMCC measure the strength of association, they do so under different conditions.
- Pearson's PMCC is best suited for linear relationships and requires data to be normally distributed.
- Spearman's Rank Correlation is suitable when the data is not normally distributed or when dealing with ordinal data or ranks.
When to Use Each Measure
- Use Spearman's when the relationship may be monotonic, but not necessarily linear, or when data violate assumptions required for Pearson's correlation.
- Use Pearson's when both variables are continuous and normally distributed, and when the relationship appears linear.
Section 3: Handling Tied Ranks
When calculating Spearman's rank correlation coefficient, tied values must be appropriately managed. This is crucial as tied ranks can significantly affect the output of the correlation. The method of assigning average ranks, as discussed earlier, should always be used.
Example 2: Ties in Ranking
Let's consider a new dataset from sports statistics, reflecting the performance of players:
| Player | Goals | Assists |
|---|---|---|
| A | 10 | 4 |
| B | 12 | 7 |
| C | 10 | 2 |
| D | 8 | 5 |
| E | 7 | 7 |
Step: Assign ranks for Goals (Ties):
- 10 (tied by A and C) will receive a rank of $ \frac{2 + 2}{2} = 2 $
| Player | Goals | Rank Goals |
|---|---|---|
| A | 10 | 2 |
| B | 12 | 5 |
| C | 10 | 2 |
| D | 8 | 4 |
| E | 7 | 1 |
This corrects ranking to account for ties and ensures our final computations are accurate.
Conclusion
In this lesson, students learned about Spearman's rank correlation coefficient as a robust method for measuring associations between two qualitative or ordinal variables. We demonstrated how to rank data, calculate $ r_s $, interpret its value, and decide which correlation coefficient to use under varying data circumstances.
Understanding these concepts is crucial, as they equip students with the analytical skills necessary for effective data interpretation in various real-life scenarios.
Study Notes
- Spearman's rank correlation coefficient $ r_s $ ranges from -1 to +1.
- $ r_s $ is calculated with ranks, averaging when tied values occur.
- It measures the strength and direction of a monotonic relationship.
- Use Spearman's for non-linear relationships or non-normally distributed data.
- Use Pearson's PMCC for linear, normally distributed relationships.
