Least Squares Approximation

students, imagine trying to draw one straight line through a cloud of scattered points 📈. The points do not line up perfectly, but you still want a line that gives the best overall fit. That is the big idea behind least squares approximation: when an exact answer does not exist, choose the answer that makes the total error as small as possible.

What least squares approximation means

In Linear Algebra, we often want to solve a system of equations written as $A\mathbf{x}=\mathbf{b}$. But in many real situations, there are more equations than unknowns, so an exact solution may not exist. This is called an overdetermined system.

For example, if several measurements are taken in an experiment, they may not all agree perfectly because of noise, rounding, or small mistakes. Instead of trying to satisfy every equation exactly, least squares approximation finds the vector $\mathbf{x}$ that makes $A\mathbf{x}$ as close as possible to $\mathbf{b}$.

The goal is to minimize the error vector $\mathbf{e}=\mathbf{b}-A\mathbf{x}$. The “best” approximation is the one that makes the length of this error as small as possible. In least squares, we minimize the squared length $\|\mathbf{e}\|^2$.

Why square the error? Squaring removes negatives and gives a smooth quantity to minimize. Also, large errors count more heavily, which often matches real-world needs. 📊

A least squares solution is not always the same as an exact solution, because there may be no exact solution at all. Instead, it is the vector $\hat{\mathbf{x}}$ that solves

$$\min_{\mathbf{x}} \|\mathbf{b}-A\mathbf{x}\|^2.$$

How projection connects to least squares

The geometric heart of least squares is projection. The columns of $A$ span a subspace called the column space of $A$, written $\text{Col}(A)$. The vector $A\hat{\mathbf{x}}$ is the point in $\text{Col}(A)$ that is closest to $\mathbf{b}$.

That means least squares is really about projecting $\mathbf{b}$ onto $\text{Col}(A)$. The difference $\mathbf{b}-A\hat{\mathbf{x}}$ is perpendicular to the entire column space. In symbols,

$$A^T\left(\mathbf{b}-A\hat{\mathbf{x}}\right)=\mathbf{0}.$$

This is called the normal equation:

$$A^TA\hat{\mathbf{x}}=A^T\mathbf{b}.$$

These equations are very important because they let us compute the least squares solution using algebra. They also explain the geometry: the residual vector $\mathbf{b}-A\hat{\mathbf{x}}$ is orthogonal to every column of $A$.

Think of shining a light straight down onto a floor. A shadow is the projection of an object onto a surface. In least squares, the vector $\mathbf{b}$ is like the object and $\text{Col}(A)$ is like the floor. The projected vector $A\hat{\mathbf{x}}$ is the shadow that gets as close as possible to the original object. 🌟

Solving a least squares problem

To find a least squares approximation, follow these ideas:

Write the system as $A\mathbf{x}=\mathbf{b}$.
Form the normal equations $A^TA\mathbf{x}=A^T\mathbf{b}$.
Solve for $\mathbf{x}$.
Use $A\mathbf{x}$ as the closest approximation to $\mathbf{b}$ in the column space.

Suppose

$$A=\begin{bmatrix}1\\1\\1\end{bmatrix},\quad \mathbf{b}=\begin{bmatrix}1\\2\\4\end{bmatrix}.$$

This means we want a single number $x$ so that $\begin{bmatrix}x\x\x\end{bmatrix}$ is as close as possible to $\mathbf{b}$. The normal equation becomes

$$A^TAx=A^T\mathbf{b}.$$

Here,

$$A^TA=1^2+1^2+1^2=3$$

and

$$A^T\mathbf{b}=1+2+4=7.$$

So the least squares solution is

$$x=\frac{7}{3}.$$

That means the best constant approximation to the values $1$, $2$, and $4$ is the average $\frac{7}{3}$. This makes sense because least squares often leads to averages. ✅

For a more realistic example, imagine a company measuring the growth of a plant over time. The data do not lie exactly on a line, but the company wants a line to predict future height. Least squares finds the line that best fits the data by minimizing total squared vertical error.

Least squares in line fitting

One of the most common applications is fitting a line to data. Suppose we have points $(x_1,y_1), (x_2,y_2), \dots, (x_m,y_m)$ and want a line

$$y=mx+c.$$

For each point, the error is the difference between the observed value $y_i$ and the predicted value $mx_i+c$. We want to minimize

$$\sum_{i=1}^m (y_i-(mx_i+c))^2.$$

This is a least squares problem because it minimizes the sum of squared errors.

To write it in matrix form, use

$$A=\begin{bmatrix}x_1 & 1\x_2 & 1\\vdots & \vdots\x_m & 1\end{bmatrix},\quad \mathbf{x}=\begin{bmatrix}m\c\end{bmatrix},\quad \mathbf{b}=\begin{bmatrix}y_1\y_2\\vdots\y_m\end{bmatrix}.$$

Then the model becomes $A\mathbf{x}\approx \mathbf{b}$. Solving the normal equations gives the slope $m$ and intercept $c$ of the best-fit line.

This is useful in science, economics, sports statistics, and engineering. For example, a coach may use least squares to estimate how practice time relates to sprint speed. A student may use it to predict study time needed for test scores. A mechanic may use it to model how fuel use changes with speed. 🚗

Why least squares matters in Linear Algebra

Least squares shows how Linear Algebra helps solve real problems even when the data are messy. It connects several major ideas:

Matrix equations: represent many conditions at once.
Column space: shows what outputs are possible.
Orthogonality: explains the shortest-error condition.
Projection: gives the best approximation in a subspace.
Optimization: chooses the smallest possible squared error.

This topic is also a bridge to other areas of math and applied science. In statistics, least squares is used in regression analysis. In computer graphics, it helps fit shapes and surfaces. In machine learning, it appears in models that learn from data. In signal processing, it helps recover clean signals from noisy measurements.

students, a helpful way to remember the big picture is this: if exact solving fails, least squares gives the closest answer in the space created by the columns of $A$. That makes it one of the most practical ideas in Linear Algebra.

Conclusion

Least squares approximation is the method of finding the vector $\hat{\mathbf{x}}$ that makes $A\hat{\mathbf{x}}$ as close as possible to $\mathbf{b}$. It is especially useful when a system has no exact solution because of extra equations or noisy data. The central tools are the error vector $\mathbf{b}-A\hat{\mathbf{x}}$, the projection onto $\text{Col}(A)$, and the normal equations $A^TA\hat{\mathbf{x}}=A^T\mathbf{b}$.

This lesson fits into the broader study of Least Squares and Applications because it explains both the math and the reason the method is so widely used. Whether you are fitting a line, analyzing data, or modeling a real-world process, least squares helps turn imperfect information into a useful approximation. 🌍

Study Notes

Least squares approximation finds the vector $\hat{\mathbf{x}}$ that minimizes $\|\mathbf{b}-A\mathbf{x}\|^2$.
It is used when $A\mathbf{x}=\mathbf{b}$ has no exact solution, often because the system is overdetermined.
The best approximation is the projection of $\mathbf{b}$ onto $\text{Col}(A)$.
The residual vector $\mathbf{b}-A\hat{\mathbf{x}}$ is orthogonal to every column of $A$.
The normal equations are $A^TA\hat{\mathbf{x}}=A^T\mathbf{b}$.
Least squares is used in line fitting, regression, prediction, and noisy measurement problems.
For fitting a line $y=mx+c$ to data, least squares minimizes $\sum_{i=1}^m (y_i-(mx_i+c))^2$.
Least squares connects matrix equations, projection, orthogonality, and optimization in one powerful idea.
It is one of the most important applications of Linear Algebra in science, engineering, and data analysis.