2. Exploring Two-Variable Data

Representing The Relationship Between Two Quantitative Variables

Representing the Relationship Between Two Quantitative Variables

Introduction: Seeing Patterns in Paired Data 📊

When students looks at two quantitative variables together, the big question is: How are they related? For example, does more studying tend to go with higher test scores? Does room temperature affect ice cream sales? Does the number of hours slept relate to reaction time? These are all examples of bivariate quantitative data, which means each data point has two numerical measurements.

In AP Statistics, representing the relationship between two quantitative variables helps you discover patterns, describe association, and prepare for later ideas like correlation and regression. The main tool for this is the scatterplot. A scatterplot displays paired data as points on a coordinate plane, with one variable on the horizontal axis and the other on the vertical axis.

Learning goals

By the end of this lesson, students should be able to:

  • explain what bivariate quantitative data is,
  • choose and interpret a scatterplot,
  • describe direction, form, strength, and unusual features of association,
  • connect scatterplots to correlation and regression,
  • use real-world evidence to describe relationships between two quantitative variables.

1. What Bivariate Quantitative Data Means

A quantitative variable is a variable whose values are numerical measurements or counts. When a data set includes two quantitative variables measured on the same individuals, we call it bivariate quantitative data.

Examples include:

  • $x =$ hours studied and $y =$ test score,
  • $x =$ outside temperature and $y =$ electricity use,
  • $x =$ arm length and $y =$ height,
  • $x =$ time spent on social media and $y =$ hours of sleep.

Each individual contributes one ordered pair, such as $(3, 88)$, where $3$ might represent hours studied and $88$ might represent a test score. The two values are linked to the same person, object, or event.

It is important to separate quantitative from categorical variables. A categorical variable puts data into groups, such as type of pet or favorite sport. A quantitative variable measures an amount, such as weight or age. In this lesson, both variables are numerical.

2. The Scatterplot: The Main Picture for Two Quantitative Variables

A scatterplot is the standard graph for showing the relationship between two quantitative variables. Each point on the plot represents one pair of values.

Usually:

  • the explanatory variable goes on the horizontal axis,
  • the response variable goes on the vertical axis.

The explanatory variable is the variable that might help explain changes in the response variable. For example, if students is studying the relationship between hours of practice and piano performance score, hours of practice may be the explanatory variable and performance score may be the response variable.

A scatterplot helps you see whether there is an association. Association means that the values of one variable tend to be related to the values of the other variable.

How to read a scatterplot

When you describe a scatterplot, focus on these four features:

  1. Direction — is the association positive, negative, or neither?
  2. Form — is it roughly linear, curved, or something else?
  3. Strength — are the points close together or more spread out?
  4. Outliers or unusual features — are there points that do not fit the overall pattern?

Example

Suppose a scatterplot shows hours studied on the $x$-axis and test score on the $y$-axis. If the points generally rise from left to right, the association is positive: as $x$ increases, $y$ tends to increase too. If the points fall from left to right, the association is negative: as $x$ increases, $y$ tends to decrease.

A pattern that looks like a straight band is linear. A pattern that bends is curved. If the points are tightly clustered around the pattern, the association is strong. If they are widely scattered, the association is weak.

3. Describing Relationship: Direction, Form, Strength, and Unusual Points

Direction

Direction tells whether the variables move together in the same way or in opposite ways.

  • Positive association: larger $x$ values tend to go with larger $y$ values.
  • Negative association: larger $x$ values tend to go with smaller $y$ values.
  • No clear association: there is no obvious pattern.

For example, as outside temperature rises, ice cream sales often rise too. That is a positive association. As the number of miles driven increases, the amount of gas left in the tank often decreases. That is a negative association.

Form

Form describes the shape of the pattern.

  • Linear means the data points follow a roughly straight-line pattern.
  • Nonlinear or curved means the pattern bends.

This matters because later in AP Statistics, correlation and regression work best when the relationship is roughly linear.

Strength

Strength describes how closely the points follow the overall pattern.

  • Strong association: points are close to the pattern.
  • Moderate association: points show a pattern, but with noticeable scatter.
  • Weak association: points are very spread out.

Strength is not the same as slope. A steep line is not automatically a strong relationship, and a gentle line is not automatically weak. Strength is about how tightly the points cluster around the pattern.

Unusual features

Look for points that stand apart from the rest.

  • An outlier is a point that is far from the other points in the overall pattern.
  • A high leverage point is an unusual $x$-value that can strongly affect the direction or slope of a line.
  • A cluster is a group of points that gather together.

Unusual points matter because they can change your interpretation of the relationship. For example, if most students scored between $60$ and $90$, but one student scored $5$, that point may be an outlier and should be examined carefully.

4. Why Scatterplots Matter in AP Statistics

Scatterplots are more than just pictures. They are a first step in statistical reasoning.

They help answer questions

A scatterplot can show whether one variable seems related to another. For example:

  • Do taller students tend to have longer legs?
  • Does spending more time on homework relate to better grades?
  • Does more rainfall relate to higher crop growth?

These are not just guesses. The scatterplot gives visual evidence.

They help you decide what analysis may be useful

If the points show a roughly linear pattern, then correlation and regression may be appropriate later. If the relationship is curved, a straight-line model may not be a good choice.

They help you notice departures from linearity

A departure from linearity means the points do not follow a straight-line pattern. This is important because a line may not summarize curved data well. If students sees a curve in the scatterplot, the relationship may need a different model or a different type of description.

Real-world example

Imagine a scatterplot of age and vocabulary size in children. At younger ages, vocabulary may increase quickly as age increases. Later, the growth may slow down. This could create a curved pattern rather than a straight one. In this case, saying “there is a positive association” is correct, but saying “the relationship is linear” may not be.

5. Comparing Scatterplots with Correlation and Regression

Scatterplots are the visual foundation for later topics.

Correlation

The correlation coefficient, written as $r$, measures the direction and strength of a linear relationship between two quantitative variables. The value of $r$ is always between $-1$ and $1$.

  • If $r$ is close to $1$, the linear association is strong and positive.
  • If $r$ is close to $-1$, the linear association is strong and negative.
  • If $r$ is close to $0$, there is little linear association.

However, correlation only describes linear relationships. If the scatterplot is curved, $r$ may be misleading.

Regression

A regression line is a line that predicts one variable from another. The most common one in AP Statistics is the least-squares regression line, which is the line that minimizes the sum of squared residuals.

A residual is the difference between an observed value and a predicted value:

$$\text{residual} = y - \hat{y}$$

where $y$ is the observed response and $\hat{y}$ is the predicted response.

Scatterplots help you see whether a regression line is a reasonable summary. If the points are roughly linear, a regression line may fit well. If the points are curved, the line may leave large residuals in some regions.

6. Example: Interpreting a Scatterplot in Context

Suppose students is given data on study time and test score.

  • Study time is on the horizontal axis.
  • Test score is on the vertical axis.
  • The points rise from left to right.
  • The pattern is roughly straight.
  • One point is far above the rest.

A correct AP Statistics description might be:

There is a strong positive linear association between study time and test score. In general, students who study more tend to score higher. There is one unusual point that may be an outlier.

Notice that this description uses context, direction, form, strength, and unusual features. It does not claim that studying automatically causes a higher score. Statistics can show association, but association alone does not prove causation.

Conclusion

Representing the relationship between two quantitative variables is a major part of exploring two-variable data in AP Statistics. students should remember that the scatterplot is the main graph for bivariate quantitative data. By looking at direction, form, strength, and unusual points, students can describe association clearly and accurately.

This lesson also sets up the rest of the topic. Scatterplots lead naturally to correlation, regression, and residual analysis. If the relationship is linear, a line may be useful. If the data curve or contain unusual points, the model may need more careful interpretation. Understanding these ideas helps students read data in real-world situations and build a strong foundation for AP Statistics 📈

Study Notes

  • Bivariate quantitative data means two numerical variables measured on the same individuals.
  • A scatterplot displays paired quantitative data as points on a coordinate plane.
  • The explanatory variable is usually on the $x$-axis, and the response variable is usually on the $y$-axis.
  • Association means the two variables show a pattern together.
  • Describe scatterplots by direction, form, strength, and unusual features.
  • Positive association means both variables tend to increase together.
  • Negative association means as one variable increases, the other tends to decrease.
  • Linear patterns are roughly straight; curved patterns are nonlinear.
  • Strength refers to how closely points follow the overall pattern.
  • Outliers are points far from the general pattern.
  • High leverage points have unusual $x$-values and can strongly affect a line.
  • Scatterplots help decide whether correlation and regression are appropriate.
  • Correlation $r$ measures the direction and strength of a linear relationship.
  • Residuals are calculated by $y - \hat{y}$.
  • A residual plot and a scatterplot help show departures from linearity.
  • Association does not prove causation.

Practice Quiz

5 questions to test your understanding