Two-variable data: models and scatterplots
Official Digital SAT skill — Problem-Solving and Data Analysis domain.
What this question tests
This skill tests whether you can use a two-variable model to make a prediction from data, especially when the data are shown in a scatterplot with a line of best fit. On the test, you are not asked to compute a regression by hand; instead, you read the model that is already provided and use it to answer a question like a predicted value at a given $x$. The model might be shown as a graphed line, given as an equation, or implied by the overall pattern in the points, and you need to choose the model that matches the trend. This matters because real data are noisy, so the test checks whether you can separate the general pattern (the model) from individual points (actual observations). You are being assessed on interpreting, not overfitting: the correct answer is about what the model predicts, not what any one point happens to be.
What to know
- A scatterplot displays paired data $(x, y)$ and is used to see the overall relationship between two variables, such as whether $y$ tends to increase or decrease as $x$ increases.
- A line of best fit is a model that summarizes the trend of the data, and its predicted value $\hat{y}$ at a given $x$ is read from the line (or computed from the line’s equation) rather than taken from a nearby point.
- For a linear model, the equation has the form $\hat{y} = mx + b$, where $m$ is the slope (change in predicted $y$ per 1 unit change in $x$) and $b$ is the $y$-intercept.
- A residual is $\text{residual} = \text{actual} - \text{predicted} = y - \hat{y}$, and it is not the same thing as the predicted value; mixing these up leads to wrong choices.
- Some data patterns are better modeled by an exponential form $\hat{y} = a\,b^x$, which typically shows slow change at first and then faster change as $x$ increases when $b>1$, but you should only use this form if the trend clearly behaves that way.
- When the test asks for a predicted value at a specific $x$, the correct target is the model’s output $\hat{y}$ at that $x$, even if no data point exists exactly there.
How to approach it
- First, identify what the question is asking you to produce, which is usually a predicted $\hat{y}$ at a specific $x$, because this tells you to focus on the model rather than individual data points.
- Next, decide what model is being used: if a line is drawn, it is a linear model; if an equation is given, use that equation; if the prompt asks you to choose a model type, match it to the pattern you see because using the wrong model leads to systematic error.
- Then, locate the target $x$ on the horizontal axis and find the corresponding point on the model (the line or curve), since the axes define the meaning of the coordinates and prevent reading the wrong variable.
- After that, estimate the predicted value $\hat{y}$ by reading the model’s $y$-value at that $x$ (or by plugging into $\hat{y} = mx + b$ if the line equation is provided), because the test expects a reasonable approximation, not perfect precision.
- If the exact $x$ is between gridlines or the line crosses between tick marks, bound your estimate between two easy values to keep it realistic and to avoid being pulled toward extreme answer choices.
- Finally, do a quick sanity check: if the model is increasing, your predicted $\hat{y}$ should be larger at larger $x$ than at smaller $x$, and if it is decreasing, it should move the other direction, which catches off-by-one-$x$ mistakes and sign errors.
Common traps
- Residual confusion: students sometimes choose a value that looks like the vertical gap between a point and the line, but that gap is a residual, not the model’s predicted $\hat{y}$; avoid this by making sure you are reading the line at the target $x$.
- Off-by-one $x$: a tempting choice may match the line’s value at $x-1$ or $x+1$ because it is easy to misread tick marks; prevent this by explicitly counting to the exact $x$ before reading $y$.
- Using an actual data point instead of the line: a nearby plotted point may be above or below the trend, and choosing its $y$ ignores noise; avoid this by prioritizing the line of best fit when asked for a prediction.
- Wrong model type: picking a linear estimate for a clearly curved exponential trend (or vice versa) can make your answer drift; avoid this by checking whether changes in $y$ appear constant per unit $x$ (linear) or proportional over equal steps (exponential).
- Scale misread: axes may use different scales or skip values, which can make a reasonable-looking answer wrong; avoid this by reading axis labels and units before estimating.
Tips & shortcuts
- Treat the line of best fit like the “average trend line” and read from it, not from the dots, when the question says “predicted.”
- Use nearby easy $x$ values to anchor the line’s behavior, then interpolate to the target $x$ to keep your estimate consistent.
- If two choices are close, pick the one that matches the direction and approximate steepness of the model around that $x$, not the one that matches a single dot.
- When an exponential pattern is present, think “slow-then-fast” growth (for $b>1$) and avoid forcing a straight-line estimate across the whole range.
Worked example
A lab’s scatterplot of temperature $x$ (in degrees Celsius) versus reaction rate $y$ (in units) is modeled by the line of best fit $y = 0.8x + 2.6$. At $x = 18$, the observed reaction rate is $17.4$. What is the residual, defined as observed minus predicted?
- A. $-0.4$
- B. $0.8$
- C. $0.4$ ✓ (correct answer)
- D. $17.0$
Why: Compute the predicted value from the model: $y = 0.8(18) + 2.6 = 14.4 + 2.6 = 17.0$. The residual is observed minus predicted, so $17.4 - 17.0 = 0.4$. Therefore, the correct answer is $C$.
Use the Practice Questions for this skill to drill it, then attempt a Timed Practice Test.
