Topic 4: Correlation And Regression

Lesson 4.2: Pearson's Product Moment Correlation Coefficient

Official syllabus section covering Lesson 4.2: Pearson's product moment correlation coefficient within Topic 4: Correlation and Regression: The product moment correlation coefficient (PMCC) as a measure of the strength of a linear relationship, taking values between minus one and one.; Obtaining the PMCC directly from a calculator and interpreting its value in context..

Lesson 4.2: Pearson's Product Moment Correlation Coefficient

Introduction

In this lesson, students, we will explore a fundamental concept in statistics known as Pearson's Product Moment Correlation Coefficient (PMCC). This measure helps us understand the strength and direction of a linear relationship between two variables. By the end of this lesson, you should be able to compute the PMCC, interpret its value, and understand the assumptions behind this measurement.

Objectives

  • Understand the definition and significance of the product moment correlation coefficient (PMCC) in measuring linear relationships.
  • Learn how to obtain the PMCC value directly from a calculator and interpret it in context.
  • Familiarize yourself with the assumption of a bivariate normal distribution when testing the PMCC.
  • Gain skills in interpreting the strength and direction indicated by the PMCC in data analysis.

Understanding Correlation

Correlation refers to the statistical relationship between two variables. If two variables tend to move together, we say they are correlated. For example, as the temperature increases, sales of ice cream might also increase. Here are key concepts regarding correlation:

  1. Positive Correlation: If one variable increases, the other variable also increases. Example: Height and weight.
  2. Negative Correlation: If one variable increases, the other variable decreases. Example: The number of hours spent studying and the number of errors made on a test.
  3. No Correlation: There is no predictable relationship between the variables. Example: The amount of rainfall and the number of televisions sold.

Pearson's Product Moment Correlation Coefficient (PMCC)

The PMCC, denoted as $r$, quantifies the degree of linear relationship between two variables, ranging from $-1$ to $1$:

  • $r = 1$ indicates a perfect positive linear correlation.
  • $r = -1$ indicates a perfect negative linear correlation.
  • $r = 0$ indicates no correlation.

Formula for PMCC

The formula for the product moment correlation coefficient is given by:

$$r = \frac{n \sum (xy) - \sum x \sum y}{\sqrt{[n \sum x^2 - (\sum x)^2][n \sum y^2 - (\sum y)^2]}}$$

Where:

  • $n$ is the number of data pairs.
  • $x$ and $y$ are the individual sample points from the datasets.

Example 1: Calculating PMCC

Suppose we have data on the number of hours studied ($x$) and the corresponding scores on a test ($y$):

Hours Studied ($x$)Test Score ($y$)
155
265
375
485
595

Step 1: Calculate necessary sums

  • $n = 5$
  • $\sum x = 1 + 2 + 3 + 4 + 5 = 15$
  • $\sum y = 55 + 65 + 75 + 85 + 95 = 375$
  • $\sum xy = (1 \cdot 55) + (2 \cdot 65) + (3 \cdot 75) + (4 \cdot 85) + (5 \cdot 95) = 55 + 130 + 225 + 340 + 475 = 1225$
  • $\sum x^2 = 1^2 + 2^2 + 3^2 + 4^2 + 5^2 = 1 + 4 + 9 + 16 + 25 = 55$
  • $\sum y^2 = 55^2 + 65^2 + 75^2 + 85^2 + 95^2 = 3025 + 4225 + 5625 + 7225 + 9025 = 29125$

Step 2: Substitute in the formula

r = $\frac{5 \times 1225 - 15 \times 375}{\sqrt{[5 \times 55 - 15^2][5 \times 29125 - 375^2]}}$

r = $\frac{6125 - 5625}{\sqrt{[275 - 225][145625 - 140625]}}$

$ r = \frac{500}{\sqrt{50 \times 5000}}$

$ r = \frac{500}{\sqrt{250000}}$

$ r = \frac{500}{500} = 1$

The PMCC of $1$ indicates a perfect positive linear correlation between hours studied and test scores.

Interpreting the PMCC

When interpreting the PMCC, consider both the strength and the direction:

  • Strength: The closer the value of $r$ is to $-1$ or $1$, the stronger the linear relationship. A value close to $0$ indicates a weak relationship.
  • Direction: A positive $r$ value indicates that both variables increase together, while a negative $r$ value indicates that as one variable increases, the other decreases.

Common Misconceptions

  1. Correlation vs. Causation: A common mistake is assuming that correlation implies causation. Just because two variables are correlated does not mean that one causes the other.
  2. Outliers Affecting Correlation: Outliers can heavily influence the PMCC. A single data point with a significantly different value can distort the value of the PMCC, leading to misleading interpretations.
  1. Linearity Requirement: The PMCC measures linear relationships only. Non-linear relationships may yield a PMCC near zero, despite a strong relationship.

Bivariate Normal Distribution Assumption

The PMCC assumes a bivariate normal distribution of the variables involved. This means:

  • The joint distribution of the paired data should be bell-shaped.
  • Both variables should ideally follow a normal distribution independently.
  • This assumption is critical when performing hypothesis tests on the correlation, such as determining significance levels.

Conclusion

Pearson's Product Moment Correlation Coefficient is an essential tool in statistics for assessing linear relationships between two variables. By following the steps laid out in this lesson, students, you can confidently calculate and interpret the PMCC for your datasets. Remember to be critical of the assumptions behind the correlation and to avoid falling into common traps.

Study Notes

  • Correlation measures the relationship between two variables.
  • PMCC, denoted as $r$, ranges from $-1$ (perfect negative) to $1$ (perfect positive).
  • Formula for PMCC:

$$r = \frac{n \sum (xy) - \sum x \sum y}{\sqrt{[n \sum x^2 - (\sum x)^2][n \sum y^2 - (\sum y)^2]}}$$

  • The value of $r$ tells us about strength (close to $-1$ or $1$) and direction (positive or negative).
  • Important to check for bivariate normal data when testing significance using the PMCC.

Practice Quiz

5 questions to test your understanding