Topic 4: Correlation And Regression

Lesson 4.3: Spearman's Rank Correlation Coefficient

Official syllabus section covering Lesson 4.3: Spearman's rank correlation coefficient within Topic 4: Correlation and Regression: Spearman's rank correlation coefficient as a measure of association based on ranks, including the consistent ranking of both variables and the treatment of tied values.; Obtaining Spearman's coefficient using a calculator..

Lesson 4.3: Spearman's Rank Correlation Coefficient

Introduction

In this lesson, we will explore Spearman's rank correlation coefficient, a statistical measure that evaluates the strength and direction of the association between two ranked variables. By the end of this lesson, students should be able to understand how to compute and interpret Spearman's coefficient and determine when to choose this measure over Pearson's product-moment correlation coefficient (PMCC).

Learning Objectives

  • Understand Spearman's rank correlation coefficient as a measure of association based on ranks, including consistent ranking of both variables and appropriate handling of tied values.
  • Learn how to accurately obtain Spearman's coefficient using a calculator.
  • Make informed choices between Spearman's rank correlation and Pearson's PMCC according to the data conditions.
  • Develop skills to rank both variables consistently, deal with tied ranks appropriately, and compute Spearman's coefficient using a calculator.
  • Interpret the value of Spearman's rank correlation coefficient in context.

Section 1: Understanding Spearman's Rank Correlation Coefficient

Spearman's rank correlation coefficient, denoted as $ r_s $, is a non-parametric measure of correlation that assesses how well the relationship between two variables can be described using a monotonic function. Unlike Pearson's correlation which measures linear relationships, Spearman's coefficient is particularly useful for data that do not meet the assumptions necessary to use Pearson's method.

The Basics of Ranking

To compute $ r_s $, we first need to convert our raw data into ranks:

  1. For each variable, assign rank values to the observations. The smallest value gets a rank of 1, the next smallest a rank of 2, and so forth.
  2. In case of tied values, assign to each tied observation the average of the ranks they would have received had they not been tied.

Example 1: Ranking

Consider the following set of data representing students' scores in Subject A and Subject B:

StudentScore AScore B
18578
29288
38582
47875
59190

Step 1: Assign ranks for Score A:

  • Observations: 85, 92, 85, 78, 91
  • Ranks: 3, 5, 3, 1, 4 (Notice how 85 is tied, so the average rank $ \frac{3+3}{2} = 3 $ is assigned).
StudentScore ARank A
1853
2925
3853
4781
5914

Step 2: Assign ranks for Score B:

  • Observations: 78, 88, 82, 75, 90
  • Ranks: 1, 4, 3, 2, 5
StudentScore BRank B
1781
2884
3823
4752
5905

Calculating Spearman's Rank Correlation Coefficient

Once the ranks are assigned, we can calculate $ r_s $ using the formula:

$$r_s = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)}$$

where $d_i$ is the difference between the ranks of each pair of observations, and $n$ is the number of observations.

Step 3: Calculate the differences $ d_i $ and their squares:

StudentRank ARank B$d_i = \text{Rank A} - \text{Rank B}$$d_i^2$
13124
25411
33300
412-11
545-11

Calculating the sum of $d_i^2$:

$$\sum d_i^2 = 4 + 1 + 0 + 1 + 1 = 7$$

Step 4: Plug values into the formula:

  • $n = 5$

$$r_s = 1 - \frac{6 \cdot 7}{5(5^2 - 1)}\ = 1 - \frac{42}{5(24)}\ = 1 - \frac{42}{120}\ = 1 - 0.35\ = 0.65\$$

Thus, $ r_s = 0.65 $.

Interpreting Spearman's Coefficient

The value of Spearman's rank correlation coefficient $ r_s $ can range from -1 to +1:

  • $ r_s = 1 $ indicates a perfect positive correlation.
  • $ r_s = -1 $ indicates a perfect negative correlation.
  • $ r_s = 0 $ implies no correlation.

A coefficient of $ r_s = 0.65 $ suggests a moderate positive correlation, indicating that as scores in Subject A increase, scores in Subject B also tend to increase.

Section 2: Choosing Between Spearman's Rank and Pearson's PMCC

While both Spearman's rank correlation and Pearson's PMCC measure the strength of association, they do so under different conditions.

  • Pearson's PMCC is best suited for linear relationships and requires data to be normally distributed.
  • Spearman's Rank Correlation is suitable when the data is not normally distributed or when dealing with ordinal data or ranks.

When to Use Each Measure

  • Use Spearman's when the relationship may be monotonic, but not necessarily linear, or when data violate assumptions required for Pearson's correlation.
  • Use Pearson's when both variables are continuous and normally distributed, and when the relationship appears linear.

Section 3: Handling Tied Ranks

When calculating Spearman's rank correlation coefficient, tied values must be appropriately managed. This is crucial as tied ranks can significantly affect the output of the correlation. The method of assigning average ranks, as discussed earlier, should always be used.

Example 2: Ties in Ranking

Let's consider a new dataset from sports statistics, reflecting the performance of players:

PlayerGoalsAssists
A104
B127
C102
D85
E77

Step: Assign ranks for Goals (Ties):

  • 10 (tied by A and C) will receive a rank of $ \frac{2 + 2}{2} = 2 $
PlayerGoalsRank Goals
A102
B125
C102
D84
E71

This corrects ranking to account for ties and ensures our final computations are accurate.

Conclusion

In this lesson, students learned about Spearman's rank correlation coefficient as a robust method for measuring associations between two qualitative or ordinal variables. We demonstrated how to rank data, calculate $ r_s $, interpret its value, and decide which correlation coefficient to use under varying data circumstances.

Understanding these concepts is crucial, as they equip students with the analytical skills necessary for effective data interpretation in various real-life scenarios.

Study Notes

  • Spearman's rank correlation coefficient $ r_s $ ranges from -1 to +1.
  • $ r_s $ is calculated with ranks, averaging when tied values occur.
  • It measures the strength and direction of a monotonic relationship.
  • Use Spearman's for non-linear relationships or non-normally distributed data.
  • Use Pearson's PMCC for linear, normally distributed relationships.

Practice Quiz

5 questions to test your understanding