Most of our study of probability has dealt with independent trials processes, in which we have i.i.d. random variables . This is useful for modeling and sampling situations. But when study random dynamical systems, they are less useful. In these cases, are no longer independent because they are all part of the system, and they have to interact with each other. So one way to modeling this is to use conditional probability.
Markov chains: basic definitions
Consider a countable sequence which
- the subscript represents time, thus we have discrete time sequence;
- each has finite outcomes.
In other words, each is a discrete random variable that takes one of possible values, where
and is the outcome space; it may be the case that .
Definition.
The process is a Markov chain if it satisfies the Markov condition:
for all ans all .
Since is assumed countable, it can be put in one-to-one correspondence with some subset of the integers, and without loss of generality we can assume that is this set of integers. Then the evolution of a chain is described by its transition probabilities
Theorem.
The transition matrix is a **stochastic matrix**, which is to say that:
- has non-negative entries, or for all ,
- has row sums equal to one, or for all .
%Note that the Markov property is equivalent to the following stipulation, which is also called -step transition probabilities.
Definition.
The **-step transition probabilities** is defined as
%Based on this, we can define
Definition.
The **transition matrix** is the matrix of **transition probabilities** . The **-step transition matrix** is the matrix of **-step transition probabilities** .
Next we introduce another assumption:
Definition.
The chain is called **homogeneous** if
Properties of Markov chains
By the assumption of homogeneity, we know . That doesn’t depend on is a consequence of the following important fact.
Theorem. [Chapman-Kolmogorov equations]
Therefore,
Furthermore, if we have homogeneity, then
the th power of .
Proof ▸
The marginalization step uses the fact that
The Markov property step uses . The th power is obtained by iteration.
◼
One consequence of the preceding theorem is that . Note that this consequence is obtained with homogeneous assumption. We write for and for . This theorem relates long-term development to short-term development, and tells us how depends on the initial variable . Let be the mass function of , and write for the row vector with entries . Then we have the following lemma.
Proof ▸
We have that
and the result follows from the preceding theorem.
◼
The random evolution of the chain is determined by the transition matrix and the initial mass function .
Absorbing Markov chains
An absorbing Markov chain is a special type of Markov chains.
Definition.
A state of a Markov chain is called **absorbing** if it is impossible to leave it (i.e., ). A Markov chain is **absorbing** if it has at least one absorbing state, and if from every state it is possible to go to an absorbing state (not necessarily in one step).
Definition.
In an absorbing Markov chain, a state which is not absorbing is called **transient**.
Consider an arbitrary absorbing Markov chain. Renumber the states so that the transient states come first. If there are absorbing states and transient states, the transition matrix will have the following canonical form
Here is an identity matrix, is an zero matrix, is a nonzero matrix, and is an matrix. The first states are transient and the last states are absorbing.
A standard matrix algebra argument shows that is of the form
where the asterisk stands for the matrix in the upper right-hand corner of .
The entries of give the probabilities for being in each of the transient states after steps for each possible transient starting state. The following theorem will show that every entry of will approach zero as approaches infinity (i.e., ).
Theorem.
In an absorbing Markov chain, the probability that the process will be absorbed is 1 (i.e., as ).
Proof ▸
From each non-absorbing state it is possible to reach an absorbing state. Let be the minimum number of steps required to reach an absorbing state, starting from . Let be the probability that, starting from , the process will not reach an absorbing state in steps. Then . Let be the largest of the and let be the largest of . The probability of not being absorbed in steps is less than or equal to , in steps less than or equal to , etc. Since , these probabilities tend to 0. Since the probability of not being absorbed in steps is monotone decreasing, these probabilities also tend to 0, hence .
◼
The fundamental matrix
Definition.
For an absorbing Markov chain , the matrix is called the **fundamental matrix** for . The entry of gives the expected number of times that the process is in the transient state if it is started in the transient state .
Theorem.
For an absorbing Markov chain the matrix has an inverse and . The -entry of the matrix is the expected number of times the chain is in state , given that it starts in state . The initial state is counted if .
Proof ▸
Let ; that is . Then, iterating this we see that . Since , we have , so . Thus
exists. Note next that
Thus multiplying both sides by gives
Letting tend to infinity we have
Let and be two transient states, and assume throughout the remainder of the proof that and are fixed. Let be a random variable which equals 1 if the chain is in state after steps, and equals 0 otherwise. For each , this random variable depends upon both and ; we choose not to explicitly show this dependence in the interest of clarity. We have
and
where is the th entry of . These equations hold for since . Therefore, since is a 0-1 random variable, .
The expected number of times the chain is in state in the first steps, given that it starts in state , is clearly
Letting tend to infinity we have
◼
Time to absorption
Given that the chain starts in state , what is the expected number of steps before the chain is absorbed? The following theorem gives the answer.
Theorem.
Let be the expected number of steps before the chain is absorbed, given that the chain starts in state , and let be the **column vector** whose th entry is . Then
where is a column vector all of whose entries are 1.
Proof ▸
If we add all the entries in the th row of , we will have the expected number of times in any of the transient states for a given starting state , that is, the expected time required before being absorbed. Thus, is the sum of the entries in the th row of . If we write this statement in matrix form, we obtain the theorem.
\end{proof}◼
Absorption probabilities
Theorem.
Let be the probability that an absorbing chain will be absorbed in the absorbing state if it starts in the transient state . Let be the matrix with entries . Then is an matrix, and
where is the fundamental matrix and is as in the canonical form.
Proof ▸
We have
This completes the proof.
◼