Markov Chains in Diagonalization and Dynamical Systems

students, imagine predicting the weather, the spread of a rumor, or how often a customer returns to a website 🌦️📱 These are all situations where the next state depends on the current state, not the entire past. That idea is the heart of a Markov chain. In linear algebra, Markov chains are powerful because they can be written using matrices, analyzed with eigenvalues, and connected to the behavior of dynamical systems over time.

What a Markov Chain Is

A Markov chain is a system that moves between different states over time, where the probability of the next state depends only on the current state. This is called the Markov property. In simple words, the system has “memory” of only one step.

For example, suppose a weather model has two states: Sunny and Rainy. If today is sunny, there may be a certain probability that tomorrow is sunny or rainy. If today is rainy, there may be a different set of probabilities for tomorrow. The key point is that tomorrow depends on today, not on the weather from three days ago.

A Markov chain is usually described by:

a set of states,
probabilities of moving from one state to another,
and a starting distribution.

If the states are numbered, then the transitions can be stored in a transition matrix $P$. Each entry $p_{ij}$ gives the probability of moving from state $j$ to state $i$ in one step. This column-based convention is common in linear algebra because the state vector can be multiplied by the matrix to get the next state.

A state vector might look like this:

$\mathbf{x}$_k = $\begin{bmatrix}$ x_{1,k} \ x_{2,k} $\end{bmatrix}$

where $x_{1,k}$ and $x_{2,k}$ represent the probabilities of being in each state at time $k$.

The system evolves by the rule

$\mathbf{x}_{k+1} = P\mathbf{x}_k.$

This equation is a dynamical system written in matrix form.

Transition Matrices and Probability Rules

A transition matrix has special properties. Every entry must be nonnegative, so $p_{ij} \ge 0$. Also, each column must sum to $1$ if the matrix is written using the convention $\mathbf{x}_{k+1} = P\mathbf{x}_k$.

Why? Because probabilities must stay probabilities. If a state vector contains probabilities, then the total probability should remain $1$ after each step:

$\sum_i x_{i,k} = 1.$

A valid Markov chain keeps that total unchanged.

Here is a simple example with two states, where the columns represent the current state and the rows represent the next state:

$P = \begin{bmatrix}$

0.8 & 0.3 \\

0.2 & 0.7

$\end{bmatrix}.$

If the current distribution is

$\mathbf{x}_0 = \begin{bmatrix}1 \ 0\end{bmatrix},$

then the next distribution is

$\mathbf{x}_1$ = P$\mathbf{x}_0$ = $\begin{bmatrix}0$.8 \ $0.2\end{bmatrix}$.

This means there is an $80\%$ chance of being in state 1 and a $20\%$ chance of being in state 2 after one step.

A real-world example is a student choosing between two study habits: focused study and distraction. If a student is focused today, there may be a high chance they stay focused tomorrow. If distracted today, they may still recover tomorrow with some probability. A transition matrix can model that behavior 📚

Repeated Steps and Dynamical Systems

The true power of Markov chains appears when we apply the transition repeatedly. After two steps,

$\mathbf{x}_2 = P\mathbf{x}_1 = P^2\mathbf{x}_0.$

After $k$ steps,

$\mathbf{x}_k = P^k\mathbf{x}_0.$

This is exactly what makes a Markov chain a discrete dynamical system. The matrix $P$ acts like a rule for updating the state again and again.

students, this connects directly to the broader topic of Diagonalization and Dynamical Systems because matrix powers are much easier to study when the matrix can be diagonalized. If

$P = SDS^{-1},$

where $D$ is diagonal, then

$P^k = SD^kS^{-1}.$

That is extremely useful because diagonal matrices are easy to raise to powers:

$D^k = \begin{bmatrix}$

$\lambda_1^k & 0 \\$

$0 & \lambda_2^k$

$\end{bmatrix}$

for a $2\times 2$ case.

This helps us understand long-term behavior. If the eigenvalues satisfy $|\lambda|<1$, their powers shrink toward $0$. If one eigenvalue is $1$, it may determine a steady-state pattern. This is one reason eigenvalues are important in Markov chains.

Example: A Two-State Weather Model

Suppose the weather follows this model:

If today is sunny, tomorrow is sunny with probability $0.9$ and rainy with probability $0.1$.
If today is rainy, tomorrow is sunny with probability $0.5$ and rainy with probability $0.5$.

Using the column convention, the transition matrix is

$P = \begin{bmatrix}$

0.9 & 0.5 \\

0.1 & 0.5

$\end{bmatrix}.$

Let the state vector be

$\mathbf{x}_k = \begin{bmatrix}s_k \ r_k\end{bmatrix},$

where $s_k$ is the probability of sunny weather and $r_k$ is the probability of rainy weather after $k$ days.

If today is sunny, then

$\mathbf{x}_0 = \begin{bmatrix}1 \ 0\end{bmatrix}.$

Then

$\mathbf{x}_1$ = P$\mathbf{x}_0$ = $\begin{bmatrix}0$.9 \ $0.1\end{bmatrix}$,

which means tomorrow is likely sunny.

To find the long-term behavior, we look for a steady-state vector $\mathbf{x}$ satisfying

$P\mathbf{x} = \mathbf{x}.$

This means the distribution does not change after one more step. In linear algebra, this is an eigenvector equation for eigenvalue $1$:

$P\mathbf{x} = 1\mathbf{x}.$

For this weather model, solving the system gives a steady-state distribution of

$\mathbf{x} = \begin{bmatrix}5/6 \ 1/6\end{bmatrix}.$

So over a long time, the weather tends to be sunny about $83.3\%$ of the time and rainy about $16.7\%$ of the time, regardless of the starting day 🌤️

Eigenvalues, Long-Term Behavior, and Diagonalization

Diagonalization helps explain why many Markov chains settle into stable patterns. If a matrix has enough linearly independent eigenvectors, it can be diagonalized. Then the powers $P^k$ are controlled by the powers of its eigenvalues.

For a Markov chain, one important fact is that $1$ is always an eigenvalue of the transition matrix when the matrix is stochastic in the column-sum convention. This happens because the total probability is preserved.

Why does this matter? Because repeated multiplication by $P$ can reveal whether the system approaches a steady state, oscillates, or changes in a more complicated way. If all eigenvalues except $1$ have absolute value less than $1$, then the effects of those components fade over time. The system may converge to a fixed probability distribution.

In many courses, this is connected to the idea of dominant eigenvalues. The largest eigenvalue in magnitude often shapes the long-term behavior. In Markov chains, the eigenvalue $1$ is central because it corresponds to equilibrium.

A simple algebraic process is:

Find the eigenvalues of $P$.
Find the eigenvector for eigenvalue $1$.
Normalize that eigenvector so its entries add to $1$.
Interpret the result as a steady-state distribution.

This procedure combines matrix algebra, systems of equations, and probability all at once.

A Real-World Application: Page Rank Style Thinking

Markov chains are used in search engines, recommendation systems, and network analysis 🌐 Suppose a user moves between web pages by clicking links. Each page can be treated as a state, and each link creates a transition probability. If many users behave according to the same pattern, then the web can be modeled as a Markov chain.

The long-term question becomes: which pages are visited most often? That is a steady-state problem. The answer can often be found by solving

$P\mathbf{x} = \mathbf{x}.$

The same idea appears in social networks, where people may move between different groups, or in genetics, where states may represent gene types across generations.

This is why Markov chains are important in linear algebra: they turn a real situation into a matrix problem. Once the matrix is built, tools like matrix multiplication, eigenvalues, and diagonalization can be used to study the system.

Conclusion

students, Markov chains show how linear algebra describes change over time. A transition matrix stores the probabilities of moving between states, and the state vector tracks the current probability distribution. Repeating the matrix gives a discrete dynamical system through $\mathbf{x}_{k+1} = P\mathbf{x}_k$. Diagonalization helps analyze $P^k$ and understand long-term behavior, especially the steady-state vector where $P\mathbf{x} = \mathbf{x}$.

So Markov chains are not just probability models. They are also matrix models, eigenvalue problems, and dynamical systems all at once. That is why they fit naturally into the study of Diagonalization and Dynamical Systems.

Study Notes

A Markov chain is a process where the next state depends only on the current state.
The transition matrix $P$ stores probabilities of moving from one state to another.
In the column-sum convention, each column of $P$ sums to $1$ and all entries satisfy $p_{ij} \ge 0$.
The state vector updates by $\mathbf{x}_{k+1} = P\mathbf{x}_k$.
After $k$ steps, the distribution is $\mathbf{x}_k = P^k\mathbf{x}_0$.
Markov chains are discrete dynamical systems because they model repeated updates over time.
A steady-state vector satisfies $P\mathbf{x} = \mathbf{x}$.
The steady state is an eigenvector for eigenvalue $1$.
Diagonalization helps compute $P^k$ using $P = SDS^{-1}$, so $P^k = SD^kS^{-1}$.
Long-term behavior is often determined by eigenvalues, especially the eigenvalue $1$.
Real-world uses include weather models, customer behavior, web navigation, and population models 📈