Analyzing Departures from Linearity
Introduction: Why do some scatterplots not form a straight line? π
students, in AP Statistics, a scatterplot helps us study the relationship between two quantitative variables. Sometimes the points seem to follow a straight-line pattern, and sometimes they do not. When a line does not fit well, we say there is a departure from linearity. That matters because many statistical tools, including correlation and linear regression, are designed for relationships that are roughly linear.
Objectives for this lesson:
- Explain what it means for a scatterplot to depart from linearity.
- Identify common patterns that show nonlinearity, clusters, and outliers.
- Use residuals to check whether a linear model is appropriate.
- Connect departures from linearity to association, correlation, and regression in AP Statistics.
Think of a car moving at different speeds during a road trip π. If you graph distance against time, the points may follow a curve instead of a line if the car speeds up or slows down. That curve is a real-world example of a non-linear relationship.
What does βlinearβ really mean?
A relationship is linear if the points in a scatterplot follow an approximate straight-line pattern. This does not mean every point lies exactly on a line. Real data usually has some scatter. The important question is whether a straight line is a good summary of the overall trend.
A linear model can be written as $\hat{y}=a+bx$, where $\hat{y}$ is the predicted value of the response variable, $a$ is the intercept, and $b$ is the slope. In AP Statistics, this model is useful only when the relationship between $x$ and $y$ is roughly linear.
When the pattern bends upward, bends downward, levels off, or changes slope, the data may show a departure from linearity. This means the straight-line model misses an important part of the pattern.
Examples of non-linear patterns include:
- a curved pattern, such as growth that speeds up over time π
- a wave-shaped pattern, such as seasonal temperature changes π€οΈ
- a pattern that increases quickly and then levels off
- a pattern that decreases rapidly and then flattens out
A strong linear association is not the same as a linear pattern. A relationship can be strong but curved. That is why looking at the scatterplot is always the first step.
How to spot departures from linearity in a scatterplot
To analyze a scatterplot, students, look for the following features:
- Form β Is the pattern straight, curved, or something else?
- Direction β Does the relationship go up or down as $x$ increases?
- Strength β Are the points close to a pattern or spread out widely?
- Outliers β Are there points far away from the main pattern?
The form is especially important for departures from linearity. If the form is curved, then a linear model may not be appropriate, even if the direction is clear.
For example, suppose you graph hours of study $x$ and test score $y$. At first, scores may increase as study time increases. But after a certain number of hours, the scores may stop improving much. The scatterplot could rise quickly and then flatten. That is not a straight line, so a linear model would not describe it well.
Another example is medicine dosage versus response. A small increase in dosage may cause a large effect at first, and then the effect may level off. This is another non-linear pattern.
Residuals: the key tool for checking linearity
A residual measures how far a data point is from the regression line. It is defined by the formula
$$\text{residual}=y-\hat{y}$$
where $y$ is the observed value and $\hat{y}$ is the predicted value from the line.
Residuals help us see whether the regression line is missing some pattern. If a linear model is a good fit, the residuals should be scattered randomly around $0$. That means the line is doing a good job of capturing the relationship.
If the residuals show a pattern, that is evidence of departure from linearity. For example:
- a U-shaped pattern in the residual plot suggests the original scatterplot is curved
- a residual plot that rises and then falls suggests the linear model is not capturing the full shape
- residuals that increase in spread as $x$ increases may suggest changing variability, which is another departure from the ideal conditions for regression
A good residual plot has no clear pattern, no curve, and no obvious change in spread. The points should be randomly scattered above and below $0$.
Example of residual reasoning
Suppose a regression line predicts that a student who studies $4$ hours will score $78$, but the actual score is $83$. The residual is
$$83-78=5$$
This positive residual means the student scored above the prediction. If many points show a similar pattern at certain values of $x$, the model may be missing curvature.
Why departures from linearity matter in regression
Correlation and regression are built around linear relationships. The correlation coefficient, written as $r$, measures the strength and direction of a linear association. If the relationship is curved, $r$ may be misleading because it only describes linear patterns.
A curved pattern can hide important information. For example, imagine the relationship between fertilizer amount and plant growth π±. A little fertilizer may help plants grow faster, but too much may not help much more. The scatterplot might curve upward and then flatten. A single straight line would oversimplify that relationship.
When there is a departure from linearity:
- the least-squares regression line may not fit well
- predictions may be inaccurate, especially at certain $x$ values
- residuals may show a clear pattern
- the correlation $r$ may not fully describe the association
In AP Statistics, this means you should not automatically use a line just because the variables are quantitative. You must first check the scatterplot and residual plot.
Beyond linear: common patterns and what they mean
Here are several common departures from linearity and how to think about them:
1. Curvature
A curved pattern is one of the most common departures. It may be U-shaped, inverted U-shaped, or bending in one direction.
Example: The relationship between speed and fuel efficiency in a car may not be perfectly linear. Very slow or very fast driving may reduce efficiency, while moderate speed may be best.
2. Clusters
Sometimes the points form groups or clusters. This may happen if data come from different populations mixed together.
Example: If a school graphs height and weight for both middle school and high school students together, the scatterplot may show separate clusters. A single line may not summarize both groups well.
3. Outliers
An outlier is a point that is far from the rest of the data. One outlier may strongly affect the regression line and correlation.
Example: If one student studied almost no time but scored extremely high, that point could pull the line away from the pattern. Always examine unusual points carefully.
4. Changing spread
Sometimes the vertical spread of the points gets larger or smaller as $x$ changes. This is called non-constant variability.
Example: The spread in house prices may increase as house size increases, because larger homes can vary a lot in price. Even if the trend is roughly upward, the changing spread can make linear modeling less reliable.
How AP Statistics expects you to respond
When analyzing a scatterplot or residual plot, students, use statistical language carefully. A strong AP-style response should mention the pattern, the direction, and whether a linear model is appropriate.
For example, a good answer might say:
βThe scatterplot shows a positive association, but the form is curved rather than linear. Therefore, a linear regression model is not appropriate because there is a departure from linearity.β
Or, for a residual plot:
βThe residual plot shows a clear U-shaped pattern, so the residuals are not randomly scattered around $0$. This indicates that the linear model does not fit well.β
Notice that these answers use evidence from the graph. AP Statistics values evidence-based reasoning, not just labels.
Conclusion: The big idea to remember β
Analyzing departures from linearity means checking whether a straight-line model is a good description of the relationship between two quantitative variables. If the scatterplot is curved, clustered, or affected by outliers, a linear model may not be appropriate. Residual plots are one of the best tools for checking this because they reveal whether the errors are random or patterned.
In the broader topic of Exploring Two-Variable Data, this lesson connects directly to scatterplots, correlation, and regression. First, graph the data. Then ask whether the form is linear. If not, describe the departure and explain why a line may not work well. This careful process helps you make accurate conclusions from real data.
Study Notes
- A scatterplot shows the relationship between two quantitative variables.
- A relationship is linear if the points follow an approximate straight-line pattern.
- A departure from linearity means the pattern is not well described by a straight line.
- Common departures include curvature, clusters, outliers, and changing spread.
- The residual formula is $\text{residual}=y-\hat{y}$.
- A good residual plot shows random scatter around $0$ with no clear pattern.
- A curved residual plot suggests the linear model is not appropriate.
- Correlation $r$ describes only linear association, so it can be misleading for curved data.
- Regression works best when the relationship is approximately linear and residuals are random.
- In AP Statistics, always support conclusions with evidence from the graph or residual plot.
