14. Topic 14(COLON) Quantitative Methods for Economists

Lesson 14.3: Correlation, Regression And Using Spreadsheets

#### Lesson focus #### Learning outcomes Students should be able to:.

Lesson 14.3: Correlation, Regression and Using Spreadsheets

Introduction

Welcome to Lesson 14.3, students! In this lesson, we will dive into the fascinating world of data analysis in economics. Our primary focus will be on correlation and regression, two essential tools that help us understand relationships between variables. We will also explore how to use spreadsheets to manage data and draw meaningful conclusions.

Learning Objectives:

By the end of this lesson, you should be able to:

  • Understand scatter diagrams and the concept of correlation: positive, negative, and zero.
  • Recognize the line of best fit and grasp the basics of simple linear regression.
  • Differentiate between correlation and causation.
  • Utilize a spreadsheet to organize data, compute statistics, and produce charts.
  • Draw reasoned, evidence-based conclusions from a data set.

Scatter Diagrams and Correlation

A scatter diagram (or scatter plot) is a graphical representation of two variables, typically showing how much one variable is affected by another. Each point on the scatter plot corresponds to one observation from the dataset. Let's explore three types of correlation:

Positive Correlation

When two variables increase together, we have a positive correlation. For example, if we look at the relationship between studying hours ($H$) and exam scores ($S$), we might see that as $H$ increases, $S$ tends to increase as well.

Positive Correlation

In a scatter plot, points will cluster around an upward-sloping line. The correlation coefficient $r$ will be greater than 0, where $ 0 < r \leq 1 $.

Negative Correlation

In contrast, when one variable increases while the other decreases, we experience a negative correlation. Consider the relationship between the amount of time spent on social media ($M$) and exam scores ($S$): as $M$ increases, $S$ may decrease.

Negative Correlation

The points will cluster along a downward-sloping line, and the correlation coefficient $r$ will be less than 0, where $ -1 \leq r < 0 $.

Zero Correlation

In some cases, there might be no correlation at all, meaning the variables do not influence each other. An example could be the number of ice creams sold ($I$) and the annual revenue of a tech company ($T$).

Zero Correlation

Here, you would see points scattered randomly, and the correlation coefficient $r$ would be close to 0, indicating no relationship between $I$ and $T$.

The Line of Best Fit and Simple Linear Regression

Now that we understand correlation, let's introduce the line of best fit. This line provides the best approximation of the relationship between two variables in a scatter plot. It is represented by the equation:

$$ Y = mX + b $$

Where:

  • $Y$ is the dependent variable (e.g., exam scores),
  • $X$ is the independent variable (e.g., studying hours),
  • $m$ is the slope of the line, representing the change in $Y$ for each unit change in $X$.
  • $b$ is the y-intercept, where the line crosses the y-axis.

Example of Simple Linear Regression

Suppose we conducted a study examining the relationship between studying hours ($H$) and exam scores ($S$). After plotting the points on a scatter plot, we find that the line of best fit has a slope $m = 5$. This means for each additional hour studied, the exam score increases by 5 points. If the y-intercept $b = 50$, then a student who doesn’t study at all would score around 50 points. The equation in this case would be:

$$ S = 5H + 50 $$

Correlation vs. Causation

It’s crucial to understand the difference between correlation and causation. Just because two variables are correlated doesn't imply that one causes the other. For instance, in our prior example, while there may be a correlation between studying hours and exam scores, this doesn’t mean that studying alone guarantees higher scores. Other factors, like the quality of study material or prior knowledge, could also play a significant role. Remember the saying: "correlation does not imply causation!" 📉️📈

Using Spreadsheets for Data Analysis

Spreadsheets like Microsoft Excel or Google Sheets are powerful tools that allow you to organize and analyze data efficiently. Here are some basic steps to follow when using a spreadsheet for our economics work:

  1. Organize Your Data: Enter your data in rows and columns, using clear headers to label each variable (e.g., Studying Hours, Exam Scores).
  2. Calculate Statistics: Use built-in functions to calculate averages, medians, and standard deviations. For example, to calculate the mean score, you could use:

$$ =AVERAGE(B2:B10) $$

(assuming your scores are in column B from row 2 to 10)

  1. Create Charts: Visualize your data by creating scatter plots or line graphs. Highlight your data, then select the chart type you want from the Insert menu.
  2. Analyze Relationships: Choose two variables, add a trendline, and display the equation on the chart to observe the relationship.

Conclusion

Understanding correlation and regression is vital for economists, as analyzing data deeply informs decision-making. Remember, students, while correlation can indicate relationships between variables, it does not mean one causes the other. Utilizing spreadsheets will enhance your data analysis skills, allowing you to present evidence-based conclusions in your reports.

Study Notes

  • Scatter Diagrams: Visual representations of two variables.
  • Correlation Types:
  • Positive correlation: Both variables increase together.
  • Negative correlation: One variable increases as the other decreases.
  • Zero correlation: No relationship between variables.
  • Line of Best Fit: The formula $Y = mX + b$.
  • Correlation vs. Causation: Correlation does not imply causation.
  • Spreadsheets: Useful for organizing data, calculating statistics, and charting.
  • Key Functions: Use functions like AVERAGE, MEDIAN in spreadsheets for data analysis.

Practice Quiz

5 questions to test your understanding

Lesson 14.3: Correlation, Regression And Using Spreadsheets — Economics | A-Warded