Normal Equations

students, imagine trying to fit the best straight line through a scatter of data points 📈. The points will usually not land perfectly on one line, so we need a method to find the line that comes as close as possible. That is the big idea behind least squares, and the normal equations are one of the main tools used to solve that problem.

In this lesson, you will learn:

what normal equations are and why they matter,
how they connect to least squares and projection ideas,
how to set up and solve them in practice,
and how they are used in real applications like data fitting and prediction.

By the end, you should be able to explain the terminology, use the method on simple problems, and describe how normal equations fit into the larger world of linear algebra and applications.

What Problem Are Normal Equations Solving?

Suppose we want to solve a system of equations $A\mathbf{x}=\mathbf{b}$, but there is no exact solution. This happens often in real life because data can be noisy, measurements can be imperfect, or there may be more equations than unknowns. For example, if you collect data about hours studied and test scores, the points will not line up perfectly. Still, you may want a line that predicts scores as accurately as possible.

In least squares, we do not try to make the error exactly zero, because that is impossible in many cases. Instead, we choose $\mathbf{x}$ so that the residual vector

$$\mathbf{r}=\mathbf{b}-A\mathbf{x}$$

has the smallest possible length. The residual is the difference between the actual data and the model prediction. The goal is to minimize $\|\mathbf{r}\|^2$.

This is where the normal equations appear. They give a condition for the best approximation:

$$A^TA\mathbf{x}=A^T\mathbf{b}$$

This equation is called the normal equation. If $A^TA$ is invertible, then the least squares solution is

$$\mathbf{x}=(A^TA)^{-1}A^T\mathbf{b}$$

This formula gives the coefficients that make the error as small as possible in the least squares sense.

Why Are They Called “Normal”?

The word “normal” here means perpendicular, or orthogonal, not ordinary ⭐. In least squares, the best residual vector is orthogonal to the column space of $A$.

That means the residual vector $\mathbf{r}$ is perpendicular to every column of $A$. In symbols,

$$A^T\mathbf{r}=\mathbf{0}$$

Since $\mathbf{r}=\mathbf{b}-A\mathbf{x}$, we substitute and get

$$A^T(\mathbf{b}-A\mathbf{x})=\mathbf{0}$$

which simplifies to

$$A^TA\mathbf{x}=A^T\mathbf{b}$$

This is the normal equation.

Why is orthogonality important? Think of shining a light straight down onto a wall. The shadow is the closest point on the wall to the object. In the same way, the least squares solution is the projection of $\mathbf{b}$ onto the column space of $A$. The error vector from $\mathbf{b}$ to that projection is perpendicular to the space itself.

Geometry of the Least Squares Idea

A lot of the power of normal equations comes from geometry. If the columns of $A$ are viewed as vectors, then all combinations of those columns form a subspace called the column space of $A$.

If $\mathbf{b}$ is not already in that space, we cannot solve $A\mathbf{x}=\mathbf{b}$ exactly. But we can find the point in the column space closest to $\mathbf{b}$. That closest point is called the projection of $\mathbf{b}$ onto the column space of $A$.

Let $\hat{\mathbf{b}}=A\mathbf{x}$ be that projection. Then the error is

$$\mathbf{r}=\mathbf{b}-\hat{\mathbf{b}}$$

For the best approximation, $\mathbf{r}$ must be orthogonal to every column of $A$. This is why the normal equations work: they encode the geometric condition for projection.

This idea is used in many settings:

fitting lines or curves to data,
estimating values from measurements,
building prediction models in science and engineering,
and solving systems that do not have exact answers.

A Simple Example with a Line Fit

Suppose we want to fit a line $y\approx m x + c$ to several data points. This can be written in matrix form as

$$A\mathbf{x}\approx \mathbf{b}$$

where

$$A=\begin{bmatrix}x_1 & 1\\ x_2 & 1\\ x_3 & 1\end{bmatrix}, \quad \mathbf{x}=\begin{bmatrix}m\\ c\end{bmatrix}, \quad \mathbf{b}=\begin{bmatrix}y_1\\ y_2\\ y_3\end{bmatrix}$$

If the points do not lie exactly on one line, then no exact $m$ and $c$ will satisfy all equations. The least squares solution chooses the line that minimizes the total squared error.

For example, consider the points $(1,2)$, $(2,3)$, and $(3,5)$. Then

$$A=\begin{bmatrix}1 & 1\\ 2 & 1\\ 3 & 1\end{bmatrix}, \quad \mathbf{b}=\begin{bmatrix}2\\ 3\\ 5\end{bmatrix}$$

The normal equations are

$$A^TA\mathbf{x}=A^T\mathbf{b}$$

First compute

$$A^TA=\begin{bmatrix}14 & 6\\ 6 & 3\end{bmatrix}$$

and

$$A^T\mathbf{b}=\begin{bmatrix}23\\ 10\end{bmatrix}$$

So we solve

$$\begin{bmatrix}14 & 6\\ 6 & 3\end{bmatrix}\begin{bmatrix}m\\ c\end{bmatrix}=\begin{bmatrix}23\\ 10\end{bmatrix}$$

This gives the line

$$y=\frac{3}{2}x+\frac{1}{3}$$

That line is not perfect for every point, but it is the best fit in the least squares sense.

How to Solve Normal Equations

To use normal equations, follow these steps:

Write the problem as $A\mathbf{x}\approx \mathbf{b}$.
Form the matrix $A^TA$ and vector $A^T\mathbf{b}$.
Solve the system $A^TA\mathbf{x}=A^T\mathbf{b}$.
Use the solution $\mathbf{x}$ to build the best approximation $A\mathbf{x}$.

This procedure is especially useful when there are more equations than unknowns, which is called an overdetermined system. In such systems, an exact solution is usually impossible, but a least squares solution often exists.

A helpful fact is that $A^TA$ is always symmetric because

$$\left(A^TA\right)^T=A^TA$$

If the columns of $A$ are linearly independent, then $A^TA$ is also invertible. That means the least squares solution is unique.

However, in some situations the columns of $A$ are not independent. Then $A^TA$ may not be invertible, and special methods are needed to find a least squares solution.

Real-World Applications 🌍

Normal equations appear in many practical areas. A few examples are:

Economics: fitting a demand trend from sales data.
Science: estimating constants in an experiment when measurements vary slightly.
Engineering: calibrating sensors so that predicted values match observed values as closely as possible.
Computer graphics: approximating shapes or surfaces from sampled points.
Machine learning: training simple linear models from data.

For example, if a scientist measures temperature over time and wants a trend line, the data will likely have small errors. Normal equations help find the line that best matches the measurements overall rather than forcing an impossible exact fit.

This is valuable because real data is rarely perfect. Least squares gives a principled way to make the best use of imperfect information.

Why Normal Equations Matter in Linear Algebra

Normal equations bring together many important ideas in linear algebra:

matrix multiplication,
orthogonality,
projections,
column space,
and solving systems of equations.

They also show how algebra and geometry work together. The algebraic condition $A^TA\mathbf{x}=A^T\mathbf{b}$ matches the geometric idea that the error is perpendicular to the space of possible answers.

Even though there are other methods for solving least squares problems, normal equations are important because they are direct, elegant, and deeply connected to the theory.

Conclusion

Normal equations are the key bridge between an impossible exact system and a best possible approximate solution. When $A\mathbf{x}=\mathbf{b}$ has no exact solution, least squares looks for the vector $\mathbf{x}$ that makes the residual as small as possible. The condition for that best fit is

$$A^TA\mathbf{x}=A^T\mathbf{b}$$

This equation comes from the fact that the residual must be orthogonal to the column space of $A$.

students, if you remember one big idea from this lesson, remember this: normal equations turn the problem of “no exact answer” into “best possible answer”. That is why they are so useful in data fitting, prediction, and many other applications of linear algebra 📊.

Study Notes

Normal equations are used in least squares when $A\mathbf{x}=\mathbf{b}$ has no exact solution.
The least squares problem minimizes $\|\mathbf{b}-A\mathbf{x}\|^2$.
The normal equation is $A^TA\mathbf{x}=A^T\mathbf{b}$.
The residual $\mathbf{r}=\mathbf{b}-A\mathbf{x}$ is orthogonal to the column space of $A$.
Least squares solutions are projections of $\mathbf{b}$ onto the column space of $A$.
If the columns of $A$ are linearly independent, then $A^TA$ is invertible and the least squares solution is unique.
Normal equations are widely used in fitting lines, estimating trends, and analyzing real data.