Projection onto Subspaces

students, imagine trying to shine a flashlight straight at a wall from an angle. The shadow it makes is not the same as the object itself, but it is the closest version of that object on the wall. In linear algebra, that idea is called a projection ✨. In this lesson, you will learn what it means to project a vector onto a subspace, why this matters in least squares, and how it helps solve real problems like fitting data, reducing error, and making predictions.

Objectives for this lesson

Explain the main ideas and terminology behind projection onto subspaces
Apply linear algebra reasoning to compute or interpret projections
Connect projections to least squares and applications
Summarize why projection onto subspaces is important in linear algebra
Use examples and evidence to understand the concept in context

What a projection means

A subspace is a set of vectors that is closed under addition and scalar multiplication. Common examples are a line through the origin, a plane through the origin, or more generally the span of some vectors. If a vector $\mathbf{b}$ does not lie in a subspace $W$, we can still ask: what vector in $W$ is closest to $\mathbf{b}$?

That closest vector is called the projection of $\mathbf{b}$ onto $W$, written as $\operatorname{proj}_W \mathbf{b}$. The idea is simple but powerful: instead of keeping the whole vector, we keep the part that lies in the subspace. This is especially useful when data or equations do not fit perfectly.

A projection should feel familiar in real life. If a ball is dropped onto a flat floor, its shadow falls directly beneath it. The shadow is the projection onto the floor. In geometry, if you measure the horizontal distance of a slanted object, you are also finding a projection. In linear algebra, the same idea works in higher dimensions with vectors and subspaces.

Projecting onto a line or plane

The easiest case is projection onto a line through the origin. Suppose the line is spanned by a nonzero vector $\mathbf{u}$. Then the projection of $\mathbf{b}$ onto that line is

\operatorname{proj}_{\mathbf{u}} $\mathbf{b}$ = $\frac{\mathbf{b} \cdot \mathbf{u}}{\mathbf{u} \cdot \mathbf{u}}$$\mathbf{u}$.

This formula says to measure how much of $\mathbf{b}$ points in the direction of $\mathbf{u}$ using the dot product, then scale $\mathbf{u}$ accordingly. The result is a vector on the line that is as close as possible to $\mathbf{b}$.

Example: let $\mathbf{b} = \begin{bmatrix} 3 \\ 4 \end{bmatrix}$ and let the line be the $x$-axis, whose direction vector is $\mathbf{u} = \begin{bmatrix} 1 \\ 0 \end{bmatrix}$. Then

\operatorname{proj}_{\mathbf{u}} $\mathbf{b}$ = $\frac{3}{1}$$\begin{bmatrix} 1$ \ $0 \end{bmatrix}$ = $\begin{bmatrix} 3$ \ $0 \end{bmatrix}$.

So the shadow of $\begin{bmatrix} 3 \\ 4 \end{bmatrix}$ on the $x$-axis is $\begin{bmatrix} 3 \\ 0 \end{bmatrix}$. The difference $\mathbf{b} - \operatorname{proj}_{\mathbf{u}} \mathbf{b} = \begin{bmatrix} 0 \\ 4 \end{bmatrix}$ is perpendicular to the line.

For a plane, the same idea applies, but the subspace may be spanned by two vectors instead of one. If $W = \operatorname{span}\{\mathbf{u}_1, \mathbf{u}_2\}$ and the basis vectors are orthonormal, then the projection is

\operatorname{proj}_W $\mathbf{b}$ = ($\mathbf{b}$ $\cdot$ $\mathbf{u}_1$)$\mathbf{u}_1$ + ($\mathbf{b}$ $\cdot$ $\mathbf{u}_2$)$\mathbf{u}_2$.

When the basis is not orthonormal, the calculation is more involved, but the central idea is unchanged: find the closest vector in the subspace.

Why the error is perpendicular

A key fact about projections is that the error vector is orthogonal to the subspace. If $\mathbf{p} = \operatorname{proj}_W \mathbf{b}$, then the error is $\mathbf{b} - \mathbf{p}$. This error satisfies

$\mathbf{b} - \mathbf{p} \perp W.$

Why does this matter? Because the closest point in a subspace is the one where the line from the original vector to the subspace meets it at a right angle. This is the geometric reason projection works.

Think of a ramp in a parking garage. If a cart rolls down, the motion along the ramp is the part of gravity that acts in the direction of the ramp. The perpendicular part does not move the cart along the ramp. In a similar way, projection isolates the part of a vector that lies inside the subspace.

This orthogonality condition is also the basis for many proofs. If $\mathbf{p}$ is the closest point in $W$, then any vector $\mathbf{w}$ in the subspace makes the quantity $\|\mathbf{b} - \mathbf{w}\|$ as small as possible when $\mathbf{w} = \mathbf{p}$. The shortest distance from $\mathbf{b}$ to $W$ is measured by the length of the error vector.

Projections and least squares

Projection onto subspaces is one of the main ideas behind least squares. In many applications, you want to solve a system $A\mathbf{x} = \mathbf{b}$, but the system may have no exact solution because the equations conflict or there are too many equations. Then you look for an approximate solution that makes the error as small as possible.

The column space of a matrix $A$ is the set of all vectors of the form $A\mathbf{x}$. If $\mathbf{b}$ is not in the column space, the best approximation is the projection of $\mathbf{b}$ onto $\operatorname{Col}(A)$. The least squares solution $\hat{\mathbf{x}}$ makes $A\hat{\mathbf{x}}$ the projection of $\mathbf{b}$ onto the column space of $A$.

This leads to the normal equations:

$A^T A\hat{\mathbf{x}} = A^T \mathbf{b}.$

These equations come from the condition that the residual $\mathbf{r} = \mathbf{b} - A\hat{\mathbf{x}}$ is orthogonal to the column space of $A$. In other words,

$A^T \mathbf{r} = \mathbf{0}.$

That means the error is perpendicular to every column of $A$. This is exactly the same orthogonality idea from projection onto a subspace.

Example: suppose a teacher wants to predict test scores using study hours. Real data rarely fall on a perfect line. A least squares line gives the best-fitting model by minimizing the sum of squared vertical errors. The predicted points lie on a line, which is a subspace after shifting to the origin in an appropriate vector space setup. The best-fit prediction is a projection of the data onto the model space.

How to compute a projection in practice

The computation depends on the type of subspace.

If $W$ is spanned by one vector $\mathbf{u}$, use

\operatorname{proj}_{\mathbf{u}} $\mathbf{b}$ = $\frac{\mathbf{b} \cdot \mathbf{u}}{\mathbf{u} \cdot \mathbf{u}}$$\mathbf{u}$.

If $W$ is spanned by columns of a matrix $A$ whose columns are linearly independent, then the projection of $\mathbf{b}$ onto $W$ is

$\operatorname{proj}_W \mathbf{b} = A\hat{\mathbf{x}},$

where $\hat{\mathbf{x}}$ solves

$A^T A\hat{\mathbf{x}} = A^T \mathbf{b}.$

For example, let

A = $\begin{bmatrix} 1$ \ $1 \end{bmatrix}$, \quad $\mathbf{b}$ = $\begin{bmatrix} 2$ \ $0 \end{bmatrix}$.

Then the subspace is the line spanned by $\begin{bmatrix} 1 \\ 1 \end{bmatrix}$. The projection formula gives

\operatorname{proj}_A $\mathbf{b}$ = $\frac{\begin{bmatrix} 2 \ 0 \end{bmatrix} \cdot \begin{bmatrix} 1 \ 1 \end{bmatrix}}{\begin{bmatrix} 1 \ 1 \end{bmatrix} \cdot \begin{bmatrix} 1 \ 1 \end{bmatrix}}$$\begin{bmatrix} 1$ \ $1 \end{bmatrix}$ = $\frac{2}{2}$$\begin{bmatrix} 1$ \ $1 \end{bmatrix}$ = $\begin{bmatrix} 1$ \ $1 \end{bmatrix}$.

So the closest point on the line to $\mathbf{b}$ is $\begin{bmatrix} 1 \\ 1 \end{bmatrix}$. The error is $\begin{bmatrix} 1 \\ -1 \end{bmatrix}$, which is perpendicular to the line because its dot product with $\begin{bmatrix} 1 \\ 1 \end{bmatrix}$ is $0$.

Applications in data and technology 📊

Projection onto subspaces is used anywhere we need a best approximation. In computer graphics, projections help create shadows and 3D scenes. In signal processing, a signal can be projected onto a set of basis functions to keep the important parts and remove noise. In statistics and machine learning, least squares regression uses projection ideas to fit models to data.

One common application is sensor calibration. Suppose a sensor collects data that should follow a simple physical rule, but the measurements contain noise. A projection-based model can estimate the closest ideal values. Another application is navigation, where GPS systems combine many measurements and estimate the most likely position using methods related to least squares.

Projections also help simplify high-dimensional data. If the important information lies near a plane or line inside a bigger space, projecting onto that subspace can reduce complexity while preserving the main trend. This is a major reason linear algebra is so useful in modern technology.

Conclusion

Projection onto subspaces is the process of finding the closest vector in a subspace to a given vector. The key ideas are closeness, orthogonality, and approximation. When the error is perpendicular to the subspace, the projected vector is the best possible fit. This same idea powers least squares, where exact solutions may not exist but the best approximation still does. students, understanding projection gives you a strong foundation for data fitting, modeling, and many real-world applications in linear algebra.

Study Notes

A subspace is a set of vectors closed under addition and scalar multiplication.
The projection of $\mathbf{b}$ onto a subspace $W$ is the closest vector in $W$ to $\mathbf{b}$.
For a line spanned by $\mathbf{u}$, the projection is

\operatorname{proj}_{\mathbf{u}} $\mathbf{b}$ = $\frac{\mathbf{b} \cdot \mathbf{u}}{\mathbf{u} \cdot \mathbf{u}}$$\mathbf{u}$.

The error vector $\mathbf{b} - \operatorname{proj}_W \mathbf{b}$ is perpendicular to the subspace $W$.
Projections are central to least squares, where the goal is to find the best approximate solution when exact solutions do not exist.
If $A\hat{\mathbf{x}}$ is the projection of $\mathbf{b}$ onto $\operatorname{Col}(A)$, then $\hat{\mathbf{x}}$ satisfies

$ A^T A\hat{\mathbf{x}} = A^T \mathbf{b}.$

Real-world uses include regression, graphics, signal processing, sensor calibration, and data approximation.
Projections help turn complicated data into the closest structured version inside a chosen model space.