Irreducible Markov chains and regular Markov chains

Definition. A Markov chain is called an **irreducible chain** if it is possible to go from every state to every state (not necessarily in one move).

Remark. Note that absorbing chains are NOT irreducible chains.

Definition. A Markov chain is called a **regular chain** if some power of the transition matrix has only positive elements.

Remark. Note that regular chains are irreducible chains. But the converse is NOT true as the following example shows.

Example. Let the transition matrix of a Markov chain be defined by P=[0110]. If we start from state 1, then next step we jump to state 2. If we start from state 2, then we jump to state 1. In other words, we are flipping between two states after each step. So the chain is irreducible. But P2n=[1001],P2n+1=[0110]. So the chain is not regular.

Remark. Any transition matrix that has no zeros determines a regular Markov chain. However, it is possible for a regular Markov chain to have a transition matrix that has zeros.

We shall discuss two important theorems relating to regular chains. But before, we will use the following two lemmas.

Lemma. \label{lemma:8.1} Suppose X is a random variable with finite number of outcomes, with the greatest being xmax and the least being xmin. Then we have E(X)xmax,E(X)xmin.

Proof ▸

Assume X has n outcomes, and let f(xi)=P(X=xi), then we have E(X)=i=1nxif(xi)i=1nxminf(xi)=xmin,E(X)=i=1nxif(xi)i=1nxmaxf(xi)=xmax.

Lemma. \label{lemma:8.2} Suppose a transition matrix P only has positive entries and the smallest entry is d>0. Consider a vector y. Denote its maximum value as M0 and its minimum value as m0. Now consider the product Py. Denote its maximum value as M1 and its minimum value as m1. Then we have M1m1(12d)(M0m0).

Proof ▸

Note that P is a transition matrix. So each row of P sums up to 1. From lemma ???, we note that each entry in the vector Py is a weighted average of the entries in y, therefore, we have (8-1)M1M0,m1m0. The largest weighted average that could be obtained in the present case would occur if all but one of the entries of y have value M0 and one entry has value m0, and this one small entry is weightedby the smallest possible weight, namely d. In this case, the weighted average would equal m0d+M0(1d). %\begin{equation} % m_0 d + M_0(1-d). %\end{equation} Similarly, the smallest possible weighted average equals M0d+m0(1d). %\begin{equation} % M_0 d + m_0(1-d). %\end{equation} Therefore, we have %Since (8-1) is true for any y, we can take some particular form of y to find the best and worst outcome of Py. First let y=[M0m0M0] such that y only one entry being m0 and such entry captures the minimum entry d of P. Then we can show that (8-2)M1m0d+M0(1d), %Similarly, by letting y=[m0M0m0] such that y only one entry being M0 and such entry captures the minimum entry d of P, we can obtain (8-3)m1M0d+m0(1d). (8-2)-(8-3) yields M1m1(12d)(M0m0).

Now we give two important theorems.

Theorem. [Fundamental limit theorem for regular chains] \label{thm:8.1} Let P be the transition matrix for a regular chain. Then, as n, the powers Pn approach a limiting matrix W with all rows the **same** vector w. The vector w is a strictly positive probability vector (i.e.., the components are all **positive** and they sum to one).

Proof ▸

We consider an arbitrary vector y, and denote the maximum entry of Pny as Mn and the minimum entry of Pny as mn. From lemma 1, we must have MnMn1,mnmn+1. Writing all these inequality from 1 to n gives M0M1Mnmnm1m0. Therefore {Mn} is a decreasing sequence bounded below and {mn} is an increasing sequence bounded above. From sequence theory we know that {Mn} and {mn} must converge. Let M=limnMn,m=limnmn. From Lemma ??? we have (8-4)Mnmn(12d)n(M0m0). On the other hand, since d is the minimum entry of P, we must have 0<d12, which gives 012d<1. Thus from (8-4) we have (8-5)limn(Mnmn)=0M=m y. We choose a particular form of y such that y=ej with jth entry being 1 and others being 0. Then Pny gives the jth column of Pn. Using (8-5), we have limnPny=α1for some α>0, which shows the jth column has the same entries. Thus, by letting y=ej, 1jn, we conclude that each column of Pn has the same entries. This finishes the proof. ◼

The ijth entry of Pn, pij(n), is the probability that the process will be in state sj after n steps if it starts in state si. If we denote the common row of W by w, then Theorem ??? states that the probability of being in sj in the long run is approximately wj, the jth entry of w, and is independent of the starting state.

Theorem. \label{thm:8.2} Let P be a regular transition matrix, let W=limnPn, and let w be the common row of W. Then

  1. wP=w, which means w is a left eigenvector of P with eigenvalue 1, and any row vector v such that vP=v is a constant multiple of w.
  2. P1=1, which means 1 is an eigenvector of P with eigenvalue 1, and any column vector x such that Px=x is a multiple of 1.

Proof ▸

To prove part (a), we note that from Theorem ???, limnPn=W. Thus limnPn+1=PnP=WP. But limnPn+1=W, and so W=WP and w=wP. Let v be any vector with vP=v. Then v=vPn, and passing to the limit, v=vW. Let r be the sum of the components of v, i.e., r=i=1nvi. Then it is easily checked that vW=rw. So, v=rw. To prove part (b), assume that x=Px. Then x=Pnx, and again passing to the limit, x=Wx. Since all rows of W are the same, the components of Wx are all equal, so x is a multiple of 1. ◼

Note that an immediate consequence of Theorem ??? is the fact that there is only one probability vector v such that vP=v, which is v=w. The probability vector v is the vector such that i=1n=1.

Remark. Computing W is equivalent to computing the left eigenvector of P with eigenvalue 1. It is equivalent to compute the right eigenvector of PT with eigenvalue 1.

Example. Let P be a transition matrix such that P=[12121434]. Compute W=limnPn. This is equivalent to compute PTx=x. Note that x is a probability vector, so 1Tx=1. PTx=x{12x1+14x2=x1x1+x2=1{x1=13x2=23 Therefore w=[13 23], and W=[ww]=[13231323].

Fixed vectors

Definition. A row vector w with the property wP=w is called a **fixed row vector** for P. Similarly, a column vector x such that Px=x is called a **fixed column vector** for P.

Thus, the common row of W is the unique vector w which is both a fixed row vector for P and a probability vector. Theorem ??? shows that any fixed row vector for P is a multiple of w and any fixed column vector for P is a constant vector. One can also state the above definition in terms of eigenvalues and eigenvectors. A fixed row vector is a left eigenvector of the matrix P corresponding to the eigenvalue 1. A similar statement can be made about fixed column vectors.

The following theorem generalizes Theorem ??? to the case where the starting state is itself determined by a probability vector.

Theorem 8.3. Let P be the transition matrix for a regular chain and v an arbitrary probability vector. Then limnvPn=w, where w is the unique fixed probability vector for P.

Proof ▸

By Theorem ???, limnPn=W. Hence limnvPn=vW. But the entries in v sum to 1, and each row of W equals w. From these statements, it is easy to check that vW=w.

Remark. If we start a Markov chain with initial probabilities given by v, then the probability vector vPn gives the probabilities of being in the various states after n steps. Theorem ??? then establishes the fact that, even in this more general class of processes, the probability of being in sj approaches wj.

Remark. Theorem 8.3 tells that regular chains always equilibriate to w.

Does the result holds for irreducible chains?

NOT true for irreducible chains. The following is a counterexample.

Example. Let P=[0110]. If we calculate the left eigenvector, we can obtain w=[1212]. But if we start from (1,0) or (0,1), we will never goes to this equilibrium.

Definition. The period d(i) of a state i is defined by d(i)=gcd{n:Pii(n)>0}, the greatest common divisor of the epochs at which return is possible. We call state i periodic if d(i)>1 and aperiodic if d(i)=1.

Remark. Aperiodic + irreducible regular for finite outcome Markov chains.