Markov Chain 2
Irreducible Markov chains and regular Markov chains
Definition. A Markov chain is called an **irreducible chain** if it is possible to go from every state to every state (not necessarily in one move).
Remark. Note that absorbing chains are NOT irreducible chains.
Definition. A Markov chain is called a **regular chain** if some power of the transition matrix has only positive elements.
Remark. Note that regular chains are irreducible chains. But the converse is NOT true as the following example shows.
Example. Let the transition matrix of a Markov chain be defined by $$\begin{equation} P = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}. \end{equation}$$ If we start from state 1, then next step we jump to state 2. If we start from state 2, then we jump to state 1. In other words, we are flipping between two states after each step. So the chain is irreducible. But $$\begin{equation} P^{2n} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}, \quad P^{2n+1} = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}. \end{equation}$$ So the chain is not regular.
Remark. Any transition matrix that has no zeros determines a regular Markov chain. However, it is possible for a regular Markov chain to have a transition matrix that has zeros.
We shall discuss two important theorems relating to regular chains. But before, we will use the following two lemmas.
Lemma. \label{lemma:8.1} Suppose $X$ is a random variable with finite number of outcomes, with the greatest being $x_{max}$ and the least being $x_{min}$. Then we have $$\begin{equation} \mathbf{E}(X) \leq x_{max}, \quad \mathbf{E}(X) \geq x_{min}. \end{equation}$$
Assume $X$ has $n$ outcomes, and let $f(x_i) = \mathbf{P}(X = x_i)$, then we have \begin{gather*} \mathbf{E}(X) = \sum_{i=1}^n x_i f(x_i) \geq \sum_{i=1}^n x_{min} f(x_i) = x_{min}, \\ \mathbf{E}(X) = \sum_{i=1}^n x_i f(x_i) \leq \sum_{i=1}^n x_{max} f(x_i) = x_{max}. \end{gather*} ◼
Lemma. \label{lemma:8.2} Suppose a transition matrix $P$ only has positive entries and the smallest entry is $d>0$. Consider a vector $y$. Denote its maximum value as $M_0$ and its minimum value as $m_0$. Now consider the product $Py$. Denote its maximum value as $M_1$ and its minimum value as $m_1$. Then we have $$\begin{equation} M_1 - m_1 \leq (1-2d) (M_0 - m_0). \end{equation}$$
Note that $P$ is a transition matrix. So each row of $P$ sums up to $1$. From lemma \ref{lemma:8.1}, we note that each entry in the vector $Py$ is a weighted average of the entries in $y$, therefore, we have $$\begin{equation} \label{eq:8.1} \tag{8-1} M_1 \leq M_0, \quad m_1 \geq m_0. \end{equation}$$ The largest weighted average that could be obtained in the present case would occur if all but one of the entries of $y$ have value $M_0$ and one entry has value $m_0$, and this one small entry is weightedby the smallest possible weight, namely $d$. In this case, the weighted average would equal $m_0 d + M_0(1-d)$. %$$\begin{equation} % m_0 d + M_0(1-d). %\end{equation}$$ Similarly, the smallest possible weighted average equals $M_0 d + m_0(1-d)$. %$$\begin{equation} % M_0 d + m_0(1-d). %\end{equation}$$ Therefore, we have %Since \eqref{eq:8.1} is true for any $y$, we can take some particular form of $y$ to find the best and worst outcome of $Py$. First let $y = \begin{bmatrix} M_0 & \cdots & m_0 & \cdots & M_0 \end{bmatrix}$ such that $y$ only one entry being $m_0$ and such entry captures the minimum entry $d$ of $P$. Then we can show that $$\begin{equation} \label{eq:8.2} \tag{8-2} M_1 \leq m_0 d + M_0(1-d), \end{equation}$$ %Similarly, by letting $y = \begin{bmatrix} m_0 & \cdots & M_0 & \cdots & m_0 \end{bmatrix}$ such that $y$ only one entry being $M_0$ and such entry captures the minimum entry $d$ of $P$, we can obtain $$\begin{equation} \label{eq:8.3} \tag{8-3} m_1 \geq M_0 d + m_0(1-d). \end{equation}$$ \eqref{eq:8.2}-\eqref{eq:8.3} yields $$\begin{equation} M_1 - m_1 \leq (1-2d) (M_0 - m_0). \end{equation}$$ ◼
Now we give two important theorems.
Theorem. [Fundamental limit theorem for regular chains] \label{thm:8.1} Let $P$ be the transition matrix for a regular chain. Then, as $n \to \infty$, the powers $P^n$ approach a limiting matrix $W$ with all rows the **same** vector $w$. The vector $w$ is a strictly positive probability vector (i.e.., the components are all **positive** and they sum to one).
We consider an arbitrary vector $y$, and denote the maximum entry of $P^n y$ as $M_n$ and the minimum entry of $P^n y$ as $m_n$. From lemma 1, we must have $$\begin{equation} M_n \leq M_{n-1}, \quad m_n \geq m_{n+1}. \end{equation}$$ Writing all these inequality from 1 to $n$ gives $$\begin{equation} M_0 \geq M_1 \geq \cdots \geq M_n \geq m_n \geq \cdots \geq m_1 \geq m_0. \end{equation}$$ Therefore $\{M_n\}$ is a decreasing sequence bounded below and $\{m_n\}$ is an increasing sequence bounded above. From sequence theory we know that $\{ M_n \}$ and $\{m_n\}$ must converge. Let $$\begin{equation} M = \lim_{n\to\infty} M_n, \quad m = \lim_{n\to\infty} m_n. \end{equation}$$ From Lemma \ref{lemma:8.2} we have $$$$$$$$$$\begin{equation} \label{eq:8.4} \tag{8-4} M_n - m_n \leq (1-2d)^n (M_0 - m_0). \end{equation}$$$$$$$$$$ On the other hand, since $d$ is the minimum entry of $P$, we must have $0 < d \leq \frac{1}{2}$, which gives $0 \leq 1-2d < 1$. Thus from \eqref{eq:8.4} we have $$$$$$$$$$\begin{equation} \label{eq:8.5} \tag{8-5} \lim_{n\to\infty} (M_n - m_n) = 0 \quad \Rightarrow \quad M=m \quad \forall \ y. \end{equation}$$$$$$$$$$ We choose a particular form of $y$ such that $y = e_j$ with $j$th entry being 1 and others being 0. Then $P^n y$ gives the $j$th column of $P^n$. Using \eqref{eq:8.5}, we have $$\begin{equation} \lim_{n\to\infty} P^n y = \alpha \mathbf{1} \quad \text{for some $\alpha > 0$}, \end{equation}$$ which shows the $j$th column has the same entries. Thus, by letting $y = e_j$, $1\leq j \leq n$, we conclude that each column of $P^n$ has the same entries. This finishes the proof. ◼
The $ij$th entry of $P^n$, $p^{(n)}_{ij}$, is the probability that the process will be in state $s_j$ after $n$ steps if it starts in state $s_i$. If we denote the common row of $W$ by $w$, then Theorem \ref{thm:8.1} states that the probability of being in $s_j$ in the long run is approximately $w_j$, the $j$th entry of $w$, and is independent of the starting state.
Theorem. \label{thm:8.2} Let $P$ be a regular transition matrix, let $$\begin{equation} W = \lim_{n\to\infty} P^n, \end{equation}$$ and let $w$ be the common row of $W$. Then
- $wP = w$, which means $w$ is a left eigenvector of $P$ with eigenvalue 1, and any row vector $v$ such that $vP = v$ is a constant multiple of $w$.
- $P\mathbf{1} = \mathbf{1}$, which means $\mathbf{1}$ is an eigenvector of $P$ with eigenvalue 1, and any column vector $x$ such that $Px = x$ is a multiple of $\mathbf{1}$.
To prove part (a), we note that from Theorem \ref{thm:8.1}, $$\begin{equation} \lim_{n\to\infty} P^n = W. \end{equation}$$ Thus $$\begin{equation} \lim_{n\to\infty} P^{n+1} = P^n P = W P. \end{equation}$$ But $\lim_{n\to\infty} P^{n+1} = W$, and so $W = WP$ and $w = wP$. Let $v$ be any vector with $vP = v$. Then $v = vP^n$, and passing to the limit, $v = vW$. Let $r$ be the sum of the components of $v$, i.e., $r = \sum_{i=1}^n v_i$. Then it is easily checked that $vW = rw$. So, $v = rw$. To prove part (b), assume that $x = Px$. Then $x = P^n x$, and again passing to the limit, $x = Wx$. Since all rows of $W$ are the same, the components of $Wx$ are all equal, so $x$ is a multiple of $\mathbf{1}$. ◼
Note that an immediate consequence of Theorem \ref{thm:8.2} is the fact that there is only one probability vector $v$ such that $vP = v$, which is $v = w$. The probability vector $v$ is the vector such that $\sum_{i=1}^n = 1$.
Remark. Computing $W$ is equivalent to computing the left eigenvector of $P$ with eigenvalue 1. It is equivalent to compute the right eigenvector of $P^T$ with eigenvalue 1.
Example. Let $P$ be a transition matrix such that $$\begin{equation} P = \begin{bmatrix} \frac{1}{2} & \frac{1}{2} \\ \frac{1}{4} & \frac{3}{4} \end{bmatrix}. \end{equation}$$ Compute $W = \lim_{n\to\infty} P^n$. This is equivalent to compute $P^T x = x$. Note that $x$ is a probability vector, so $\mathbf{1}^T x = 1$. $$\begin{equation} P^T x = x \quad \Rightarrow \quad \begin{cases} \frac{1}{2} x_1 + \frac{1}{4}x_2 = x_1 \\ x_1+x_2 = 1 \end{cases} \quad \Rightarrow \quad \begin{cases} x_1 = \frac{1}{3} \\ x_2 = \frac{2}{3} \end{cases} \end{equation}$$ Therefore $w = [\frac{1}{3} \ \frac{2}{3}]$, and $W = \begin{bmatrix} w \\ w \end{bmatrix} = \begin{bmatrix} \frac{1}{3} & \frac{2}{3} \\ \frac{1}{3} & \frac{2}{3} \end{bmatrix}$.
Fixed vectors
Definition. A row vector $w$ with the property $wP = w$ is called a **fixed row vector** for $P$. Similarly, a column vector $x$ such that $Px = x$ is called a **fixed column vector** for $P$.
Thus, the common row of $W$ is the unique vector $w$ which is both a fixed row vector for $P$ and a probability vector. Theorem \ref{thm:8.2} shows that any fixed row vector for $P$ is a multiple of $w$ and any fixed column vector for $P$ is a constant vector. One can also state the above definition in terms of eigenvalues and eigenvectors. A fixed row vector is a left eigenvector of the matrix $P$ corresponding to the eigenvalue 1. A similar statement can be made about fixed column vectors.
The following theorem generalizes Theorem \ref{thm:8.1} to the case where the starting state is itself determined by a probability vector.
Theorem 8.3. Let $P$ be the transition matrix for a regular chain and $v$ an arbitrary probability vector. Then $$\begin{equation} \lim_{n\to\infty} vP^n = w, \end{equation}$$ where $w$ is the unique fixed probability vector for $P$.
By Theorem \ref{thm:8.1}, $$\begin{equation} \lim_{n\to\infty} P^n = W. \end{equation}$$ Hence $$\begin{equation} \lim_{n\to\infty} vP^n = vW. \end{equation}$$ But the entries in $v$ sum to 1, and each row of $W$ equals $w$. From these statements, it is easy to check that $$\begin{equation} vW = w. \end{equation}$$ ◼
Remark. If we start a Markov chain with initial probabilities given by $v$, then the probability vector $v P^n$ gives the probabilities of being in the various states after $n$ steps. Theorem \ref{thm:8.3} then establishes the fact that, even in this more general class of processes, the probability of being in $s_j$ approaches $w_j$.
Remark. Theorem 8.3 tells that regular chains always equilibriate to $w$.
Does the result holds for irreducible chains?
NOT true for irreducible chains. The following is a counterexample.
Example. Let $$\begin{equation} P = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}. \end{equation}$$ If we calculate the left eigenvector, we can obtain $$\begin{equation} w = \begin{bmatrix} \frac{1}{2} & \frac{1}{2} \end{bmatrix}. \end{equation}$$ But if we start from $(1,0)$ or $(0,1)$, we will never goes to this equilibrium.
Definition. The period $d(i)$ of a state $i$ is defined by $d(i) = \gcd\{n : P_{ii}(n) > 0\}$, the greatest common divisor of the epochs at which return is possible. We call state $i$ periodic if $d(i) > 1$ and aperiodic if $d(i) = 1$.
Remark. Aperiodic + irreducible $\Leftrightarrow$ regular for finite outcome Markov chains.