Topic 4: Correlation And Regression

Lesson 4.3: Spearman's Rank Correlation Coefficient

Official syllabus section covering Lesson 4.3: Spearman's rank correlation coefficient within Topic 4: Correlation and Regression: Ranking both variables consistently, handling tied values appropriately, and obtaining Spearman's rank correlation coefficient by calculator.; Interpreting Spearman's coefficient in context..

Lesson 4.3: Spearman's Rank Correlation Coefficient

Introduction

In this lesson, we will explore Spearman's rank correlation coefficient, a crucial statistical method for measuring the strength and direction of association between two ranked variables. By the end of this lesson, students will be able to rank variables consistently, handle tied values appropriately, calculate Spearman's rank correlation coefficient using a calculator, and interpret the results in context.

Learning Objectives

  • Ranking both variables consistently, handling tied values appropriately, and obtaining Spearman's rank correlation coefficient by calculator.
  • Interpreting Spearman's coefficient in context.
  • Recognizing that Spearman's coefficient needs no distributional assumptions and knowing when to prefer it over the Pearson product-moment correlation coefficient (PMCC).
  • Ranking data, including ties, and obtaining Spearman's rank correlation coefficient using a calculator.
  • Interpreting the value of Spearman's coefficient in context.

Spearman's Rank Correlation Coefficient

Spearman's rank correlation coefficient, denoted as $r_s$, is a non-parametric measure of correlation that assesses how well the relationship between two variables can be described by a monotonic function. It is especially useful when the data does not necessarily follow a normal distribution or when the relationship is not linear.

1. Understanding Rank and Ties

To begin with, let's discuss ranking. When we rank data, we assign a position to each value in the data set based on its magnitude. For instance, in a list of test scores, the highest score receives rank 1, the next highest rank 2, and so forth.

Example: Ranking Data with Ties

Consider the following test scores:

StudentScore
Alice85
Bob90
Charlie85
David92
Eva85

To rank these scores appropriately, we assign the same rank to those with identical scores. In this case, Alice, Charlie, and Eva all received a score of 85, so they will share the average rank of their positions:

  • Alice: Rank 2.33 (average of ranks 2, 3, and 4)
  • Bob: Rank 1
  • Charlie: Rank 2.33
  • David: Rank 1
  • Eva: Rank 2.33

Thus, the ranked data would look like this:

StudentScoreRank
Alice852.33
Bob901
Charlie852.33
David921
Eva852.33

2. Calculating Spearman's Rank Correlation Coefficient

Once we have ranked our data correctly, we can calculate Spearman's rank correlation coefficient using the formula:

$$r_s = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)}$$

where:

  • $d_i$ is the difference between the ranks of each pair of observations,
  • $n$ is the number of paired observations.

Example: Calculation

Let's use the previous scores, but this time, we will add another variable, the hours studied:

StudentScoreHours Studied
Alice8510
Bob9012
Charlie8510
David9215
Eva858

Now we rank both variables:

  • For Scores: Rank is (2.33, 1, 2.33, 1, 2.33)
  • For Hours Studied: We can rank these as (3, 1, 3, 2, 4)

Next, we will calculate the differences in ranks $d_i$:

StudentScore RankHours Studied Rank$d_i$$d_i^2$
Alice2.333-0.670.4489
Bob1100
Charlie2.333-0.670.4489
David12-11
Eva2.334-1.672.7889

Now we sum $d_i^2$:

$$\sum d_i^2 = 0.4489 + 0 + 0.4489 + 1 + 2.7889 = 4.6867$$

After finding the number of observations, which is 5, we can plug our results into the formula:

$$r_s = 1 - \frac{6 \cdot 4.6867}{5(5^2 - 1)} = 1 - \frac{28.1202}{5 \cdot 24} = 1 - \frac{28.1202}{120} = 1 - 0.2343 = 0.7657$$

So, the calculated Spearman's rank correlation coefficient $r_s$ is approximately 0.7657.

3. Interpreting Spearman's Rank Correlation Coefficient

The value of $r_s$ ranges from -1 to 1. A value of:

  • 1 indicates a perfect positive correlation, meaning as one variable increases, the other variable also increases.
  • -1 indicates a perfect negative correlation, meaning as one variable increases, the other decreases.
  • 0 indicates no correlation, meaning no relationship can be detected.

In our example, since $r_s \approx 0.7657$, we can interpret this as a strong positive correlation between the score and the hours studied. This suggests that students who study more tend to achieve higher scores.

4. When to Use Spearman's Rank Correlation

Unlike the Pearson product-moment correlation coefficient, which assumes that the data is normally distributed and assesses linear relationships, Spearman's rank correlation does not have these assumptions. Use Spearman's rank when:

  • The data is ordinal (ranked) or does not meet the assumptions required for Pearson's correlation.
  • There are outliers in the data that may influence the correlation.
  • The relationship between the variables is not linear and may be better described by a monotonic relationship.

Conclusion

In this lesson, we have learned about Spearman's rank correlation coefficient, including how to rank data accurately, manage tied values, perform calculations using the formula, and interpret the results in context. students should feel confident using Spearman's rank correlation in various scenarios, especially when dealing with ranked or non-linear data.

Study Notes

  • Spearman's rank correlation coefficient is calculated using ranks of data instead of raw scores.
  • It handles tied ranks by assigning average ranks.
  • The formula for $r_s$ is given as:

$$r_s = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)}$$

  • The coefficient can range from -1 (perfect negative correlation) to 1 (perfect positive correlation).
  • Spearman's is preferred over PMCC when data is ordinal or when the normality assumption of the data is violated.
  • A high positive value signifies a strong direct relationship between variables.

Practice Quiz

5 questions to test your understanding

Lesson 4.3: Spearman's Rank Correlation Coefficient — A-Level Statistics | A-Warded