Hidden Markov Model

19MAT117

Aadharsh Aadhithya - CB.EN.U4AIE20001
Anirudh Edpuganti - CB.EN.U4AIE20005

Madhav Kishore - CB.EN.U4AIE20033
Onteddu Chaitanya Reddy - CB.EN.U4AIE20045
Pillalamarri Akshaya - CB.EN.U4AIE20049

Team-1

Hidden Markov Model

19MAT117

Markov chains

19MAT117

Markov chains

19MAT117

Markov chains

P(X_{n+1}=x|X_n = x_n)

Markov chains

P(X_{n+1}=x|X_n = x_n)

P(X_4 = \, \, \, \,\,\,\,\,| X_3 = \,\,\,\,\,\,\,) = 0.7

Hidden Markov Model

\text{States are hidden}

Hidden Markov Model

A = \,\,\,\,\,\,\,\,\,\begin{matrix} 0.5 & 0.3 & 0.2 \\ 0.4 & 0.2 & 0.4 \\ 0.0 & 0.3 & 0.7 \end{matrix}

B = \,\,\,\,\,\,\,\,\,\begin{matrix} 0.9 & 0.1 \\ 0.6 & 0.4 \\ 0.2 & 0.8 \end{matrix}

Transition\,Matrix

Emission\,Matrix

Hidden Markov Model

Initial\,distribution

Stationary\,distribution

\pi = [0.3\,\,\,0.2\,\,\,0.5]

\pi = \pi A^n

\lim{n \rightarrow \infty}

\pi \rightarrow \pi^*

Hidden Markov Model

1.\,\,\text{Given a Hidden Markov Model and an observation sequence} \\ \text{What is the probability of the Model producing that particular sequence }

Problems

2.\,\,\text{Given a Hidden Markov Model and an observation sequence} \\ \text{Which state sequence maximize the probability of given observation sequence }

3.\,\,\text{Given an observation sequence ,how to train the model to get } \\ \text{such parameters of HMM to maximize the propability of observations sequence }

Hidden Markov Model

1.\,\,\text{Given a Hidden Markov Model and an observation sequence} \\ \text{What is the probability of the Model producing that particular sequence }

Problems

2.\,\,\text{Given a Hidden Markov Model and an observation sequence} \\ \text{Which state sequence maximize the probability of given observation sequence }

3.\,\,\text{Given an observation sequence ,how to train the model to get } \\ \text{such parameters of HMM to maximize the propability of observations sequence }

Problem 1

1.\,\,\text{Given a Hidden Markov Model and an observation sequence} \\ \text{What is the probability of the Model producing that particular sequence }

P(\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,|\lambda) = ?

\lambda = {A\,\,\, \\ B\,\,\,\, \\ \pi }

Forward Algorithm

\pi = [0.375\,\,\,\,0.625]

Forward Algorithm

\pi = [0.375\,\,\,\,0.625]

\alpha_1(R) = 0.375\times0.8

\alpha_1(S) = 0.625\times0.4

Forward Algorithm

\pi = [0.375\,\,\,\,0.625]

\alpha_1(R)=0.3

\alpha_1(S) = 0.25

\alpha_2(R) = \alpha_1(R)\times0.5\times0.8 + \alpha_1(S)\times0.3\times0.8

\alpha_2(S) = \alpha_1(R)\times0.5\times0.4 + \alpha_1(S)\times0.7\times0.4

Forward Algorithm

\pi = [0.375\,\,\,\,0.625]

\alpha_1(R)=0.3

\alpha_2(S) = 0.25

\alpha_2(R) = 0.18

\alpha_2(S) = 0.13

\alpha_3(R) = 0.0258

\alpha_3(S) = 0.1086

Forward Algorithm

\alpha_1(R)=0.3

\alpha_2(S) = 0.25

\alpha_2(R) = 0.18

\alpha_2(S) = 0.13

\alpha_3(R) = 0.0258

\alpha_3(S) = 0.1086

P(\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,|\lambda) = \alpha_3(R)+\alpha_3(S) = 0.1343

\lambda = {A\,\,\, \\ B\,\,\,\, \\ \pi }

Problem 2

2.\,\,\text{Given a Hidden Markov Model and an observation sequence} \\ \text{Which state sequence maximize the probability of given observation sequence }

Problem 2

2.\,\,\text{Given a Hidden Markov Model and an observation sequence} \\ \text{Which state sequence maximize the probability of given observation sequence }

\begin{aligned} & \underset{state\, seq}{\text{maximize}} & & P(state \,seq|obs. \,seq, \lambda) \\ \end{aligned}

Problem 2

\begin{aligned} & \underset{state\, seq}{\text{maximize}} & & P(state \,seq|obs. \,seq, \lambda) \\ \end{aligned}

P(?|\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, , \lambda)

Problem 2

P(?|\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, , \lambda)

Solution

Viterbi Algorithm

Veterbi Algorithm

\pi = [0.375\,\,\,\,0.625]

Veterbi Algorithm

\pi = [0.375\,\,\,\,0.625]

\alpha_1(R) = 0.375\times0.8

\alpha_1(S) = 0.625\times0.4

Veterbi Algorithm

\alpha_1(R) = 0.3

\alpha_1(S) = 0.25

Veterbi Algorithm

\alpha_1(R) = 0.3

\alpha_1(S) = 0.25

Veterbi Algorithm

\alpha_1(R) = 0.3

Veterbi Algorithm

\alpha_1(R) = 0.3

0.12

0.06

Veterbi Algorithm

\alpha_1(R) = 0.3

0.12

Veterbi Algorithm

\alpha_1(R) = 0.3

0.012

0.036

0.12

Veterbi Algorithm

\alpha_1(R) = 0.3

0.036

0.12

Veterbi Algorithm

\alpha_1(R) = 0.3

0.12

0.036

Veterbi Algorithm

\alpha_1(R) = 0.3

0.12

0.036

P(?|\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, , \lambda)

Veterbi Algorithm

P(R,R,S|\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, , \lambda) = 0.036

Problem 3

O_1 , O_2 \cdots O_3

O_1 , O_2 \cdots O_i \cdots O_T

A^*,B^*,\pi^*

O_1

A^*,B^*,\pi^*

O_2

O_3

O_{T-2}

O_{T-1}

O_T

O_t

O_1

A^*,B^*,\pi^*

O_2

O_3

O_{T-2}

O_{T-1}

O_T

O_t

\alpha_t(i)=P\left( O_1 , O_2, \cdots O_t , q_t = S_i | \lambda \right)

S_i

O_1

A^*,B^*,\pi^*

O_2

O_3

O_{T-2}

O_{T-1}

O_T

O_t

\beta_t(i)=P\left( O_{t+1} , O_{t+2} , \cdots O_T , q_t = S_i | \lambda \right)

S_i

O_1

A^*,B^*,\pi^*

O_2

O_3

O_{T-2}

O_{T-1}

O_T

O_t

\gamma_t(i)=P\left( q_t = S_i |O, \lambda \right)

S_i

O_1

A^*,B^*,\pi^*

O_2

O_3

O_{T-2}

O_{T-1}

O_T

O_t

\xi_t(i,j)=P\left( q_t = S_i, q_{t+1} = S_j |O, \lambda \right)

S_i

S_j

O_{t+1}

O_1

A^*,B^*,\pi^*

O_2

O_3

O_{T-2}

O_{T-1}

O_T

O_t

\xi_t(i,j)=P\left( q_t = S_i, q_{t+1} = S_j |O, \lambda \right)

S_i

S_j

O_{t+1}

\alpha_t(i)

\beta_{t+1}(j)

a_{ij}b_j(O_{t+1})

O_1

A^*,B^*,\pi^*

O_2

O_3

O_{T-2}

O_{T-1}

O_T

O_t

\xi_t(i,j)=

S_i

S_j

O_{t+1}

\alpha_t(i)

\beta_{t+1}(j)

a_{ij}b_j(O_{t+1})

O_1

A^*,B^*,\pi^*

O_2

O_3

O_{T-2}

O_{T-1}

O_T

O_t

\xi_t(i,j)=

S_i

S_j

O_{t+1}

\alpha_t(i)

\beta_{t+1}(j)

a_{ij}b_j(O_{t+1})

\frac{\alpha_t(i)a_{ij}b_j(O_{t+1})\beta_{t+1}(j)}{P(O| \lambda)}

O_1

A^*,B^*,\pi^*

O_2

O_3

O_{T-2}

O_{T-1}

O_T

O_t

\xi_t(i,j)=

S_i

S_j

O_{t+1}

\alpha_t(i)

\beta_{t+1}(j)

a_{ij}b_j(O_{t+1})

\frac{\alpha_t(i)a_{ij}b_j(O_{t+1})\beta_{t+1}(j)}{P(O| \lambda)}

\sum_{t}

S_i

S_j

S_i

S_j

O_1

A^*,B^*,\pi^*

O_2

O_3

O_{T-2}

O_{T-1}

O_T

O_t

\xi_t(i,j)=

S_i

S_j

O_{t+1}

\frac{\alpha_t(i)a_{ij}b_j(O_{t+1})\beta_{t+1}(j)}{P(O| \lambda)}

\sum_{t}^{T}

S_i

S_j

S_i

S_j

E[a_{ij}]

Expected Number of Transitions

From State i to j

i,j

O_1

A^*,B^*,\pi^*

O_2

O_3

O_{T-2}

O_{T-1}

O_T

O_t

\gamma_t(i)= \sum_t P\left( q_t = S_i |O, \lambda \right)

S_i

E[a_{ij}]

Expected Number of transitions from state Si

A^*

B^*

\pi^*

a_{ij} = \frac{E\left [ S_i \rightarrow S_j \right] }{E[S_i \rightarrow]}

a_{ij} = \frac{\sum_{t} \xi_t(i,j)}{\sum_t \gamma_t(i)}

A^*

B^*

\pi^*

b_i(k) = \frac{E [S_i \cap O_k] }{E[S_i]}

\frac{}{\sum_t \gamma_t(i)}

\sum_t \gamma_t(i)

O_t = O_k

b_i(k)=

\gamma_t(i)= \sum_t P\left( q_t = S_i |O, \lambda \right)

Iteratively Calculate A,B

a_{ij} = \frac{\sum_{t} \xi_t(i,j)}{\sum_t \gamma_t(i)}

\frac{}{\sum_t \gamma_t(i)}

\sum_t \gamma_t(i)

O_t = O_k

b_i(k)=

Converges to a local maxima

Iteratively Calculate A,B

a_{ij} = \frac{\sum_{t} \xi_t(i,j)}{\sum_t \gamma_t(i)}

\frac{}{\sum_t \gamma_t(i)}

\sum_t \gamma_t(i)

O_t = O_k

b_i(k)=

Converges to a local maxima

Baum Welch Algorithm, is a type of Expectation Maximisation

Credit Card fraud detection

₹100

₹200

₹500

₹1000

₹7000

Credit Card fraud detection

₹100

₹200

₹500

₹1000

₹7000

Observation

Seq.

Credit Card fraud detection

₹100

₹200

₹500

₹1000

₹7000

P(L | O) = \frac{2}{5}

P(M | O) = \frac{2}{5}

P(H | O) = \frac{1}{5}

Credit Card fraud detection

₹100

₹200

₹500

₹1000

₹7000

P(L | O) = \frac{2}{5}

P(M | O) = \frac{2}{5}

P(H | O) = \frac{1}{5}

Naively We can Observe, Given the observation sequence,Probability of a high transaction is low

Credit Card fraud detection

₹100

₹200

₹500

₹1000

₹7000

What if, we can learn from history?

The Process can be modeled as a Markov Process

Further, Since we aren't sure about the states causing the Observation,It should be modelled as Hidden Markov Model

Learning...Hmm....?🤔

O_1 , O_2 \cdots O_i \cdots O_T

A^*,B^*,\pi^*

Baum Welch Algorithm comes to the rescue

O_1 , O_2 \cdots O_i \cdots O_T

A^*,B^*,\pi^*

After Learning the Parameters of HMM, We can find the probability of a sequence of observations, Given the Model which is our Forward Algorithm

A^*,B^*,\pi^*

O_1 , O_2 \cdots O_i \cdots O_T

A^*,B^*,\pi^*

Credit Card fraud detection

₹100

₹200

₹500

₹1000

₹7000

₹20000

19MAT117

Applications and Future Learning Directions

Applications

Sequence Alignment in Biology
Widely Used in NLP
Inference from Time Series
Molecular Evolutionary models
Phylogenitcs

Applications

Future Learning Directions

Generalizations of HMM - Bayesian Networks
Continuous-Time Markov Models
Other methods of Expectation-Maximization for learning

References

[3] djp3, Hidden Markov Models 12: the Baum-Welch algorithm, (Apr. 10, 2020). Accessed: Jan. 17, 2022. [Online]. Available: https://www.youtube.com/watch?v=JRsdt05pMoI

[5] “Markov Chains Clearly Explained! - YouTube.” https://www.youtube.com/ (accessed Jan. 17, 2022).

[4] Normalized Nerd, Hidden Markov Model Clearly Explained! Part - 5, (Dec. 26, 2020). Accessed: Jan. 17, 2022. [Online]. Available: https://www.youtube.com/watch?v=RWkHJnFj5rY

[2] L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, Feb. 1989, doi: 10.1109/5.18626.

[1] “(14) (PDF) A revealing introduction to hidden markov models.” https://www.researchgate.net/publication/288957333_A_revealing_introduction_to_hidden_markov_models (accessed Jan. 17, 2022).

19MAT117

19MAT117

19MAT117

19MAT117

19MAT117

Thank you Mam

MIS-3

MIS-3

Incredeble us

19MAT117

19MAT117

19MAT117

19MAT117

19MAT117

Thank you Mam

MIS-3

More from Incredeble us