Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

3. Discrete-time Markov Chains 3.1 Definitions and Basic Concepts
Section 3.1: Definitions and Basic Concepts
STAT 333 Ch 3. Discrete-time Markov Chains 
(Section 3.1) Spring 2021
3. Discrete-time Markov Chains 3.1 Definitions and Basic Concepts
Stochastic Process
Definition: {X (t), t 2 T } is called a stochastic process if X (t) is a rv (or possibly a random
vector) for any given t 2 T . T is referred to as the index set and is often interpreted in the
context of time. As such, X (t) is often called the state of the process at time t. We note that:
Index set T
8><>:
can be a continuum of values such as T = {t : t 0},
can be a set of discrete points such as T = {t0, t1, t2, . . .}.
Since there is a one-to-one correspondence between the sets T = {t0, t1, t2, . . .} and
N = {0, 1, 2, . . .}, we will use T = N as the general index set for a discrete-time stochastic
process (unless otherwise stated). In other words, {X (n), n 2 N} or {Xn, n 2 N} will represent
a general discrete-time stochastic process.
Discrete-time Stochastic Process
Some examples of discrete-time stochastic processes {Xn, n 2 N} might include:
(1) Xn represents the outcome of the nth toss of a die,
(2) Xn represents the price of a stock at the end of day n trading,
(3) Xn represents the maximum temperature in Waterloo during the nth month,
(4) Xn represents the number of goals scored in game n by the varsity hockey team,
(5) Xn represents the number of STAT 333 students in class for the nth lecture.
Discrete-time Markov Chain
Definition: A stochastic process {Xn, n 2 N} is said to be a discrete-time Markov chain
(DTMC) if the following two conditions hold true:
(1) For n 2 N, Xn is a discrete rv (i.e., the state space S of Xn is of discrete type).
(2) For n 2 N and all states x0, x1, . . . , xn+1 2 S, the Markov property must hold:
P(Xn+1 = xn+1|Xn = xn,Xn1 = xn1, . . . ,X1 = x1,X0 = x0) = P(Xn+1 = xn+1|Xn = xn).
In mathematical terms, this property states that the conditional distribution of any future
state Xn+1 given the past states X0,X1, . . . ,Xn1 and the present state Xn is
independent of the past states.
In a more informal way , the Markov property tells us, for a random process, that if we
know the value taken by the process at a given time, we will not get any additional
information about the future behaviour of the process by gathering more knowledge about
the past.
Discrete-time Markov Chain
Remarks:
(1) Unless otherwise stated, the state space S of a DTMC {Xn, n 2 N} will be assumed to be
N.
(2) In general, the sequence of rvs {Xn}1n=0 are neither independent nor identically distributed.
(3) The Markov property does not require “full” information on the past to ensure
independence. For example, consider the following conditional probability:
P(Xn+1 = xn+1|Xn = xn,Xn2 = xn2, . . . ,X1 = x1,X0 = x0)
=
P(Xn+1 = xn+1,Xn = xn,Xn2 = xn2, . . . ,X1 = x1,X0 = x0)
P(Xn = xn,Xn2 = xn2, . . . ,X1 = x1,X0 = x0)
,
which is “missing” the information for Xn1.
Discrete-time Markov Chain
However, note that
P(Xn+1 = xn+1,Xn = xn,Xn2 = xn2, . . . ,X1 = x1,X0 = x0)
=
1X
xn1=0
P(Xn+1 = xn+1,Xn = xn,Xn1 = xn1,Xn2 = xn2, . . . ,X1 = x1,X0 = x0)
=
1X
xn1=0
P(Xn+1 = xn+1|Xn = xn,Xn1 = xn1,Xn2 = xn2, . . . ,X1 = x1,X0 = x0)
⇥ P(Xn = xn,Xn1 = xn1,Xn2 = xn2, . . . ,X1 = x1,X0 = x0)
= P(Xn+1 = xn+1|Xn = xn)

1X
xn1=0
P(Xn = xn,Xn1 = xn1,Xn2 = xn2, . . . ,X1 = x1,X0 = x0) by Markov property
= P(Xn+1 = xn+1|Xn = xn)P(Xn = xn,Xn2 = xn2, . . . ,X1 = x1,X0 = x0).
Discrete-time Markov Chain
We have:
P(Xn+1 = xn+1|Xn = xn,Xn2 = xn2, . . . ,X1 = x1,X0 = x0)
=
P(Xn+1 = xn+1,Xn = xn,Xn2 = xn2, . . . ,X1 = x1,X0 = x0)
P(Xn = xn,Xn2 = xn2, . . . ,X1 = x1,X0 = x0)
and
P(Xn+1 = xn+1,Xn = xn,Xn2 = xn2, . . . ,X1 = x1,X0 = x0)
= P(Xn+1 = xn+1|Xn = xn)P(Xn = xn,Xn2 = xn2, . . . ,X1 = x1,X0 = x0).
Substituting this latter expression into the numerator of the top equation yields
P(Xn+1 = xn+1|Xn = xn,Xn2 = xn2, . . . ,X1 = x1,X0 = x0) = P(Xn+1 = xn+1|Xn = xn).
It is straightforward to extend the above result to any number of past time points from
0, 1, . . . , n 1 with “missing” information. This is the essence of the Markov property.
One-step Transition Probability Matrix
Definition: For any pair of states i and j , the transition probability from state i at time n to
state j at time n + 1 is given by
Pn,i ,j = P(Xn+1 = j |Xn = i), n 2 N.
Let Pn be the associated matrix containing all these transition probabilities, referred to as the
one-step transition probability matrix (TPM) from time n to time n + 1. It looks like
Pn = [Pn,i ,j ] =
26666664
0 1 2 · · · j · · ·
0 Pn,0,0 Pn,0,1 Pn,0,2 · · · Pn,0,j · · ·
1 Pn,1,0 Pn,1,1 Pn,1,2 · · · Pn,1,j · · ·
...
...
...
...
...
i Pn,i ,0 Pn,i ,1 Pn,i ,2 · · · Pn,i ,j · · ·
...
...
...
...
...
37777775,
where, for convenience, the states of the DTMC are labelled along the margins of the matrix.
One-step Transition Probability Matrix
For each pair of states i and j , if Pn,i ,j = Pi ,j 8 n 2 N, then we say that the DTMC is
stationary or homogeneous. In this situation, the one-step TPM becomes:
P = [Pi ,j ] =
26666664
0 1 2 · · · j · · ·
0 P0,0 P0,1 P0,2 · · · P0,j · · ·
1 P1,0 P1,1 P1,2 · · · P1,j · · ·
...
...
...
...
...
i Pi ,0 Pi ,1 Pi ,2 · · · Pi ,j · · ·
...
...
...
...
...
37777775.
Remark: In STAT 333, we only consider stationary DTMCs. Moreover, from the construction
of the TPM P , it is clear that Pi ,j 0 8 i , j 2 N and
P1
j=0 Pi ,j = 1 8 i 2 N (i.e., each row
sum of P must be 1). Such a matrix whose elements are non-negative and whose row sums
are equal to 1 is said to be stochastic.
One-step Transition Probability Matrix
Example 3.1. On a given day, the weather is either clear, overcast, or raining. If the weather
is clear today, then it will be clear, overcast, or raining tomorrow with respective probabilities
0.6, 0.3, and 0.1. If the weather is overcast today, then it will be clear, overcast, or raining
tomorrow with respective probabilities 0.2, 0.5, and 0.3. If the weather is raining today, then it
will be clear, overcast, or raining tomorrow with respective probabilities 0.4, 0.2, and 0.4.
Construct the underlying DTMC and determine its TPM.
n-step Transition Probability Matrix
Definition: For any pair of states i and j , the n-step transition probability is given by
P(n)i ,j = P(Xm+n = j |Xm = i), m, n 2 N.
Due to the stationary assumption, this quantity is actually independent of m (which is why we
do not include m in its notation). Thus, P(n)i ,j = P(Xn = j |X0 = i), n 2 N. Furthermore, it is
evident that
P(0)i ,j = P(X0 = j |X0 = i) =
(
0 , if i 6= j ,
1 , if i = j .
In a similar manner, let P(n) =
h
P(n)i ,j
i
represent the associated n-step TPM. Clearly, when
n = 1, P(1) = P . When n = 0, P(0) = I , where I represents the identity matrix. Just as with
the one-step TPM, it follows that the row sums of P(n) must equal 1 as well.
Chapman-Kolmogorov Equations
For n 2 Z+, let us consider
P(n)i ,j = P(Xn = j |X0 = i)
=
1X
k=0
P(Xn = j |Xn1 = k ,X0 = i)P(Xn1 = k |X0 = i)
=
1X
k=0
P(n1)i ,k P(Xn = j |Xn1 = k ,X0 = i)
=
1X
k=0
P(n1)i ,k P(Xn = j |Xn1 = k) due to the Markov property
=
1X
k=0
P(n1)i ,k Pk,j . (3.1)
Chapman-Kolmogorov Equations
We have: P(n)i ,j =
P1
k=0 P
(n1)
i ,k Pk,j (3.1)
Recall: If A = [ai ,j ], B = [bi ,j ], and C = AB where C = [ci ,j ], then ci ,j =
P
k ai ,kbk,j .
As a result, note that (3.1) implies that P(n) = P(n1)P , n 2 Z+. More generally, P(n)i ,j can be
expressed as
P(n)i ,j =
1X
k=0
P(m)i ,k P
(nm)
k,j 8 i , j 2 N and 0  m  n,
which are referred to as the Chapman-Kolmogorov equations for a DTMC. In matrix form, this
translates to
P(n) = P(m)P(nm), 0  m  n.
n-step Transition Probability Matrix
Coming back to P(n) = P(n1)P , n 2 Z+, let us look at a few values of n:
Take n = 2 =) P(2) = P(1)P = PP = P2,
Take n = 3 =) P(3) = P(2)P = P2P = P3,
Take n = 4 =) P(4) = P(3)P = P3P = P4.
Clearly, we see that
P(n) = Pn,
and so the n-step TPM is simply the one-step TPM multiplied by itself n times.
Marginal pmf of Xn
For n 2 N, let us now introduce a particular row vector, which we will denote as either
↵n = (↵n,0,↵n,1, . . . ,↵n,k , . . .),
or
↵n =

↵n,0 ↵n,1 . . . ↵n,k . . .

,
where ↵n,k = P(Xn = k) 8 k 2 N. In other words, ↵n,k represents the marginal pmf of Xn,
n 2 N. As a consequence, it follows that P1k=0 ↵n,k = 1 8 n 2 N.
If we focus on the case when n = 0, then ↵0 is referred to as the initial probability row vector
of the DTMC, or simply the initial conditions of the DTMC.
Marginal pmf of Xn
For n 2 Z+, note that
↵n,k = P(Xn = k)
=
1X
i=0
P(Xn = k |Xm = i)P(Xm = i) where 0  m  n
=
1X
i=0
↵m,iP(Xnm = k |X0 = i) due to the stationary assumption
=
1X
i=0
↵m,iP
(nm)
i ,k .
In matrix form, the above relation implies that
↵n = ↵mP
(nm) = ↵mP
nm, 0  m  n,
which subsequently leads to
↵n = ↵0P
(n) = ↵0P
n, n 2 N.
Probabilities of Interest
Having knowledge of the initial conditions and the one-step transition probabilities, one can
readily calculate various probabilities of possible interest such as
P(Xn = xn,Xn1 = xn1, . . . ,X1 = x1,X0 = x0)
= P(X0 = x0)P(X1 = x1|X0 = x0)P(X2 = x2|X1 = x1,X0 = x0) · · ·
⇥ P(Xn = xn|Xn1 = xn1,Xn2 = xn2, . . . ,X0 = x0)
= P(X0 = x0)P(X1 = x1|X0 = x0)P(X2 = x2|X1 = x1) · · ·P(Xn = xn|Xn1 = xn1)
= ↵0,x0Px0,x1Px1,x2 · · ·Pxn1,xn ,
where the second last equality follows due to the Markov property.
Probabilities of Interest
Similarly,
P(Xn+m = xn+m,Xn+m1 = xn+m1, . . . ,Xn+1 = xn+1|Xn = xn)
= P(Xn+m=xn+m,Xn+m1=xn+m1,...,Xn+1=xn+1,Xn=xn)P(Xn=xn)
= P(Xn=xn)P(Xn+1=xn+1|Xn=xn)···P(Xn+m=xn+m|Xn+m1=xn+m1,Xn+m2=xn+m2,...,Xn=xn)P(Xn=xn)
= Pxn,xn+1Pxn+1,xn+2 · · ·Pxn+m1,xn+m .
The key observation here is that the DTMC is completely characterized by its one-step TPM
P and the initial conditions ↵0.
Probabilities of Interest
Example 3.2. A particle moves along the states {0, 1, 2} according to a DTMC whose TPM
is given by
P =
24
0 1 2
0 0.7 0.2 0.1
1 0 0.6 0.4
2 0.5 0 0.5
35.
Let Xn denote the position of the particle after the nth move. Suppose that the particle is
equally likely to start in any of the three positions.
Probabilities of Interest
(a) Calculate P(X3 = 1|X0 = 0).
Probabilities of Interest
(b) Calculate P(X4 = 2).
Probabilities of Interest
(c) Calculate P(X6 = 0,X4 = 2).
Solution:...
(d) Calculate P(X9 = 0,X7 = 2|X5 = 1,X2 = 0).
Accessibility and Communication
With these basic results in place, we next consider the classification of states in a DTMC.
Definition: State j is accessible from state i (denoted by i ! j) if 9 n 2 N such that
P(n)i ,j > 0. If states i and j are accessible from each other, then states i and j communicate
(denoted by i $ j). In other words, i $ j i↵ 9 m, n 2 N such that P(n)i ,j > 0 and P(m)j ,i > 0.
Accessibility and Communication
In terms of accessibility, note that the size of the components of P do not matter. All that
matters is which are positive and which are 0. In particular, if state j is not accessible from
state i , then P(n)i ,j = 0 8 n 2 N and
P(DTMC ever visits state j |X0 = i)
= P ([1n=0{Xn = j}|X0 = i)

1X
n=0
P(Xn = j |X0 = i) due to Boole’s inequality (see Exercise 1.1.1)
=
1X
n=0
P(n)i ,j
= 0,
implying that P(DTMC ever visits state j |X0 = i) = 0.
Equivalence Relation
The concept of communication forms what is known as an equivalence relation, satisfying the
following criteria:
(i) Reflexivity: i $ i .
Clearly true since P(0)i ,i = 1 > 0.
(ii) Symmetry: i $ j =) j $ i .
This is obviously true by definition.
(iii) Transitivity: i $ j and j $ k =) i $ k .
To see this holds formally, we know that 9 n 2 N such that P(n)i ,j > 0. Also, 9 m 2 N such
that P(m)j ,k > 0. Using the Chapman-Kolmogorov equations, we have that
P(n+m)i ,k =
1X
l=0
P(n)i ,l P
(m)
l ,k P(n)i ,j P(m)j ,k > 0.
Therefore, P(n+m)i ,k > 0, implying that i ! k . Using precisely the same logic, it is
straightforward to show that k ! i . Thus, by definition, i $ k .