Residuals and Goodness of Fit

Introduction: checking how well a model works 📈

students, when we use functions to model real-world data, we are usually trying to answer a question like: “Does this equation describe the pattern well?” In the real world, data is rarely perfect. A model may follow the general trend, but individual points often sit above or below the curve or line. Residuals help us measure those differences, and goodness of fit helps us judge how well the model represents the data.

In this lesson, you will learn to:

explain the main ideas and terminology behind residuals and goodness of fit,
apply IB Mathematics: Applications and Interpretation HL reasoning to evaluate models,
connect residuals and goodness of fit to functions, graphs, and regression,
summarize how this topic fits into the broader study of functions,
use evidence from data to decide whether a model is useful.

This matters because a model is only helpful if it matches the situation closely enough to make reliable predictions. For example, a school might use a linear model to predict the number of students using the library each day, or a sports analyst might use a quadratic model to estimate the path of a thrown ball. In both cases, residuals tell us how far off the model is, and goodness of fit tells us how trustworthy the model is.

What is a residual?

A residual is the difference between an observed value and the value predicted by a model. If the actual data point is $y$ and the model predicts $\hat{y}$, then the residual is

$\text{residual} = y - \hat{y}$

This formula is very important. It tells us whether the model overestimates or underestimates the real data.

If the residual is positive, then $y > \hat{y}$, so the point lies above the model.
If the residual is negative, then $y < \hat{y}$, so the point lies below the model.
If the residual is $0$, then the model passes exactly through that data point.

Think of residuals like a “miss distance” 🎯. If you throw a basketball and a model predicts the ball will land at a certain point, the residual is how far the real landing point is from the predicted one.

Example

Suppose a regression line predicts that when $x = 5$, the value should be $\hat{y} = 12$. If the actual observed value is $y = 15$, then

\text{residual} = 15 - 12 = 3

This means the model underestimated the actual value by $3$ units.

If another point has $y = 10$ when $\hat{y} = 12$, then

\text{residual} = 10 - 12 = -2

This means the model overestimated by $2$ units.

Residuals are useful because they show the error at each point, not just the overall trend.

Residual plots and patterns

A very important tool in IB Maths AI HL is the residual plot. A residual plot graphs the residuals on the vertical axis and the explanatory variable $x$ on the horizontal axis. This helps us see whether the model is appropriate.

If a model is a good fit, the residuals should be scattered randomly around $0$ with no clear pattern. That means the model is capturing the main relationship well.

If the residuals show a pattern, then the model may be inappropriate. Common warning signs include:

a curved pattern, which suggests a linear model may not be suitable,
residuals that spread out as $x$ increases, which suggests changing variability,
clusters or runs of points above or below $0$, which may show the model is missing something important.

Real-world interpretation

Imagine you are modeling the relationship between temperature and ice cream sales. A straight line might work over part of the data, but if the residual plot curves, it may show that the relationship is not truly linear. Perhaps sales rise quickly as temperature increases, but then level off on very hot days. In that case, a different function might fit better.

Residual plots are not just for “looking nice.” They are evidence. In IB, you should use them to justify whether a model is reasonable.

Goodness of fit: how well does the model describe the data?

Goodness of fit is a general phrase that means how closely a model matches the observed data. A model with good fit has small residuals overall and no obvious pattern in the residual plot.

There are several ways to judge goodness of fit, and the method depends on the type of model.

1. Residual size

If the residuals are mostly small, the model is usually better than one with large residuals. However, size alone is not enough. A model can have some small errors but still miss the true shape of the relationship.

2. Residual pattern

A random scatter of residuals around $0$ suggests a better fit than a visible curve or trend.

3. Correlation and $r$

For linear models, the correlation coefficient $r$ helps measure the strength and direction of the linear relationship. Values of $r$ close to $1$ or $-1$ indicate a strong linear relationship, while values close to $0$ suggest a weak linear relationship.

But students, be careful: a high value of $|r|$ does not automatically mean the model is perfect. It only measures linear association, not whether the line is actually the best model for the situation.

4. Coefficient of determination $r^2$

The value $r^2$ tells us the proportion of variation in the dependent variable explained by the model.

For example, if $r^2 = 0.92$, then about $92\%$ of the variation in the data is explained by the model. That sounds strong, but it does not guarantee the model is appropriate for making predictions outside the data range.

Example

Suppose two models are used for the same data. Model A has residuals that are randomly scattered, but Model B has a curved residual pattern. Even if both models have a similar $r^2$, Model A is usually better because it matches the structure of the data more appropriately.

Regression, technology, and interpretation on the IB syllabus 💻

In IB Mathematics: Applications and Interpretation HL, technology is often used to create regression models. You may use graphing software or a calculator to find a line, quadratic, exponential, power, or other fitted function.

The process often looks like this:

enter the data,
choose a regression type,
view the fitted function,
examine residuals and statistics,
decide whether the model is suitable.

For example, if a set of data about the growth of a plant seems to increase rapidly at first and then more slowly later, an exponential or logistic-type model may be more suitable than a linear one. The residual plot can help confirm whether the choice makes sense.

A short interpretation example

Suppose a model for fuel use in a car gives predictions that are always slightly too low for short trips and too high for long trips. The residual plot may show a pattern rather than random scatter. This suggests the model is missing something, perhaps the fact that fuel efficiency changes with driving conditions.

In exam-style questions, you may be asked to interpret residuals in context. That means translating the math into clear real-world language. For example, a residual of $-4$ in a height model means the model predicted a value that was $4$ units too high.

Choosing the right model in context

Residuals and goodness of fit are deeply connected to the broader topic of functions because a function is meant to describe a relationship between quantities. The question is not only “Can we write a function?” but also “Is this function suitable for the real situation?”

A model can be mathematically correct and still be a poor representation of the data. For instance:

a linear function may be too simple for curved data,
a quadratic function may fit a short interval but fail outside it,
an exponential function may work for early growth but not after limits are reached.

This is why interpretation matters. Data and context should guide the choice of function, not just the algebra.

Practical caution: extrapolation

Even a model with excellent goodness of fit inside the data range may be unreliable beyond it. Extrapolation means using a model to predict outside the observed range. Residual analysis cannot guarantee that the same pattern will continue forever.

For example, a population model may fit well for several years, but future conditions may change. Always ask whether the function still makes sense in context.

Conclusion

Residuals show the difference between observed values and predicted values, using $y - \hat{y}$. Goodness of fit describes how well a model matches data overall. In IB Mathematics: Applications and Interpretation HL, these ideas help you decide whether a regression model is useful, whether the residuals are random, and whether a function is appropriate for a real-world situation.

The key message is simple: a model is not judged only by having an equation. It is judged by how well it matches the data and the context. Residuals and goodness of fit give you the evidence needed to support that judgment.

Study Notes

A residual is defined by $y - \hat{y}$.
A positive residual means the actual value is above the prediction.
A negative residual means the actual value is below the prediction.
A residual of $0$ means the model matches the data point exactly.
A residual plot should show random scatter around $0$ if the model is a good fit.
A visible pattern in residuals suggests the model may not be appropriate.
Goodness of fit means how well a model describes the observed data.
For linear models, $r$ measures direction and strength of linear association.
The value $r^2$ tells how much variation is explained by the model.
A high $r^2$ does not automatically mean the model is the best choice.
Technology is used to fit regression models and inspect residuals.
Good modeling in functions always includes interpreting results in context.
A model may fit the data well within the observed range but still be unreliable for extrapolation.
Residuals and goodness of fit are essential tools for deciding whether a function is suitable for a real-world situation.