Markov Chains

Understanding Markov chains.
Author

Vannsh Jani

Published

August 27, 2023

What are Markov models?

Markov models are mathematical models which are used to model sequential data, where the current observation is dependant on the past observations. Markov chains are the simplest markov models wherein, the current observation is only dependant on the previous observation and not dependant on observations prior to previous observations.

This can be represented by,

\[ P(x_t|x_1,x_2,...x_{t-1})=P(x_t|x_{t-1}) \]

Following is how we can represent the dependencies in data

We can can calculate probability of a sequence as follows,

\[ P(x_1,x_2,..x_n)=P(x_1)P(x_2|x_1)P(x_3|x_2)...P(x_n|x_{n-1}) \]

What this basically says is, The probability of any observation \(x_i\), is only dependant on \(x_{i-1} \quad \forall \quad i \in [2,n]\).

\(P(x_1)\), is called the prior probability for the state/observation initially. The prior probability is the probability for starting from one of the states. It is denoted by \(\pi_i=P(x_1=i)\), where \(i\) denotes the initial state from all possible states. The prior probability is one of the parameters of the Markov chain model.

Another parameter for the Markov chain model is the transition matrix denoted by \(A\). If there are \(K\) states, the transition matrix will be a \(K\)x\(K\) matrix, where \(A_{ij}=P(x_t=j|x_{t-1}=i)\).

Let’s take an example for 3 states.

We can use the markov chain model to predict which city we will go to next, given the city we are in currently.

Let’s assume the 3 cities/states to be Bangalore, Chennai and Mumbai and following is the transition matrix for this example.

Bangalore Chennai Mumbai
Bangalore 0.3 0.4 0.3
Chennai 0.2 0.1 0.7
Mumbai 0.4 0.4 0.2

The sum of each row of the transition matrix must sum up to 1 as it covers all the possibilities.

Following is the markov transition graph, which shows the probabilities of going from one city to another.

Markov chain sampling

We can generate a sequence of observations using the following,

  1. Select initial state\((x_1)\) using \(\pi\).
  2. Sample the state\((x_t)\) from \(A\) and \(x_{t-1}\), for \(t \in[2,..,T]\).

Let’s take the above example and generate a sequence of 6 time stamps.

prior probability
Bangalore 0.4
Chennai 0.2
Mumbai 0.4

You can change the values of prior probabilities and probabilities in the transition matrix in the link given below.

https://vannshmarkovchain.streamlit.app/

Markov chain properties

Let’s understand some properties of markov chains with the following example

We know that if there is an arrow from state A to state B, then there is a non-zero transition probability from state A to state B. If we start random walk from state “0” in the above diagram, we can never come back to state zero even after infinite steps. Such a state where we cannot come back to is called a transient state. Hence, in the above example “0” is a transient state. If we look at state “1” or state “2” we know that we are bound to come back to the same state after some steps. Such a state is called a recurrent state. Here, state “1” and “2” are examples of recurrent states.

In a markov chain, if all states not reachable from all other states, we say that the markov chain is reducible. In the above example, we cannot reach state “0” from state “1” or “2”.

If we add an arrow from state “2” to state “0”, then it is possible to come back to state “0” and hence, “0” is not a transient state any more.

Such a markov chain, where it is possible to go from every state to another(not necessarily in one move) is called an ergodic markov chain.

Transition in n steps

Let’s consider the cities example again, suppose we wanted to calculate the probability of travelling from Bangalore to Chennai in 2 steps, we will have to consider all possible cases.

\[ A=\begin{bmatrix} 0.3 & 0.4 & 0.3 \\ 0.2 & 0.1 & 0.7 \\ 0.4 & 0.4 & 0.2 \end{bmatrix} \]

Let’s denote probability of going from state i to state j in n steps as \(p_{ij}(n)\). In this example 0 denotes Bangalore, 1 denotes Chennai and 2 denotes Mumbai.

\[ p_{01}(2) = p_{02}(1)p_{21}(1)+p_{00}(1)p_{01}(1)+p_{01}(1)p_{11}(1) \]

Note that \(p_{02}(1)\) is just equal to \(A_{02}\). Hence, the above equation can be written as

\[ \begin{equation}\begin{split} p_{01}(2) &= A_{02}A_{21}+A_{00}A_{01}+A_{01}A_{11} \\ p_{01}(2) &= \begin{bmatrix} A_{00} & A_{01} & A_{02} \end{bmatrix} \begin{bmatrix} A_{01} \\ A_{11} \\ A_{21} \end{bmatrix} \end{split} \end{equation} \]

Hence \(p_{01}(2)\) is the dot product of the 0th row and 1st column of matrix A. Hence \(p_{01}(2)\) is the element present in the 0th row and the 1st column in \(AXA\) or \(A^2\). Let’s verify it.

\[ \begin{equation}\begin{split} p_{01}(2) &= 0.3*0.4+0.3*0.4+0.4*0.1 \\ p_{01}(2) &= 0.28 \\ A^2 &= \begin{bmatrix} 0.29 & 0.28 & 0.43 \\ 0.36 & 0.37 & 0.27\\ 0.28 & 0.28 & 0.44 \end{bmatrix}\end{split}\end{equation} \]

Hence, \(A^n_{ij}\) denotes the probability of moving from state i to state j in n steps. If some power of the trasnition matrix A has all positive values then it is possible to move from every state to every other where the number of steps is equal to the power taken of A and hence, A is ergodic.