by Nikolai Shokhirev

Algorithms |

Hidden Markov models are widely used in science, engineering and many other areas (speech recognition, optical character recognition, machine translation, bioinformatics, computer vision, finance and economics, and in social science).

** Definition:**
The Hidden Markov Model (HMM) is a variant of a

*Formal Definition:*

Hidden states * Q* = {

Transition probabilities
*
A* = {

Observations (symbols) ** O** = {

Emission probabilities * B* = {

Initial
state probabilities ** Π** = {

HMM

The model is characterized by the complete set of parameters:
* Λ* =

**Canonical problems**

There are 3 canonical problems to solve with **HMM**s:

- Given the model parameters, compute the probability of a particular output sequence. This problem is solved by the Forward and Backward algorithms (described below).
- Given the model parameters, find the most likely sequence of (hidden) states which could have generated a given output sequence. Solved by the Viterbi algorithm and Posterior decoding.
- Given an output sequence, find the most likely set of state transition and output probabilities. Solved by the Baum-Welch algorithm

Let *α*_{t}(*i*) be the probability of the partial observation sequence
*O _{t}* = {

*α*_{t}(*i*) = *P*(*o*(1), *o*(2), ... , *o*(*t*) * _{ }*|

Then the unconditional probability of the
partial observation sequence is the sum of *α*_{t}(*i*) over all
* N* states.

Observed and hidden sequences

The Forward Algorithm is a recursive algorithm for calculating *α*_{t}(*i*)
for the observation
sequence of increasing length *t* . First, the probabilities
for the single-symbol sequence are calculated as a product of initial *i*-th state probability and
emission probability of the given symbol *o*(1) in the *i*-th state. Then the
recursive formula is applied. Assume we have calculated *α*_{t}(*i*)
for some *t*. To calculate *α*_{t+1}(*j*), we multiply every *α*_{t}(*i*)
by the corresponding transition probability
from the *i*-th state to the *j*-th state, sum the products over all states, and then multiply the
result by the emission probability of the symbol *o*(*t*+1). Iterating the process, we can eventually
calculate *α** _{T}*(

*Formal Definition*

**Initialization:**

** ** *α*_{1}(*i*)
= *p _{i} *

*Recursion:*

here *i* =1, ... , *N *, *t* =1, ... , *T*
- 1

*Termination:*

* β*_{t}(*i*) = *P*(*o*(*t*+1), *o*(*t*+2), ... , *o*(*T*) * _{ }*|

The Backward Algorithm calculates recursively backward variables going backward along the observation sequence. The Forward Algorithm is typically used for calculating the probability of an observation sequence to be emitted by an HMM, but, as we shall see later, both procedures are heavily used for finding the optimal state sequence and estimating the HMM parameters.

*Formal Definition*

*Initialization:*

** ** *β** _{T }*(

According to the above definition, ** ** *β*_{T}(*i*)
does not exist. This is a formal extension of the below recursion to *t* = *T*.

*Recursion:*

here *i* =1, ... , *N *, *t* = *T*
- 1, *T*
- 2 , . . . , 1

*Termination:*

Obviously both Forward and Backward algorithms must give the same results for
total probabilities *P*(*O*) = *P*(*o*(1), *o*(2), ... , *o*(*T*)
).

There are several possible criteria for finding the most likely sequence of hidden states. One is to choose states that are individually most likely at the time when a symbol is emitted. This approach is called posterior decoding.

Let * λ*_{ t}(*i*) be the probability of the model to emit
the symbol *o*(*t*) being in the *i*-th state for the
given observation sequence *O*.

* λ*_{ t}(*i*) = *P*( *q*(*t*) = * q*_{i}
| *O* ).

It is easy to derive that

* λ*_{ t}(*i*) = *α*_{t}(*i*)
* β*_{t}(*i*) / *P*( *O* ) , *i* =1, ... , *N *, *t* =1, ... , *T*

Then at each time we can select the state *q*(*t*) that
maximizes * λ*_{ t}(*i*).

*q*(*t*) = arg max {*λ*_{ t}(*i*)}

Posterior decoding works fine in the case when HMM is ergodic, i.e. there is transition
from any state to any other state. If applied to an HMM of another architecture, this approach
could give a sequence that may not be a legitimate path because some transitions are not
permitted.

The Viterbi algorithm chooses the best state sequence that maximizes the likelihood of the state sequence for the given observation sequence.

Let * δ*_{ t}(*i*) be the maximal probability of state sequences of the length
*t* that end in state * i* and
produce the *t* first observations for the given model.

* δ*_{ t}(*i*) = max{*P*(*q*(1),
*q*(2), ..., *q*(*t*-1) ; *o*(1), *o*(2), ... , *o*(*t*)
| *q*(*t*) = * q*_{i}
).}

The Viterbi algorithm is a dynamic programming algorithm that uses the same schema as the Forward algorithm except for two differences:

- It uses maximization in place of summation at the recursion and termination steps.
- It keeps track of the arguments that maximize
*δ*_{ t}(*i*) for each*t*and*i*, storing them in the*N*by*T*matrix. This matrix is used to retrieve the optimal state sequence at the backtracking step.*ψ*

*Initialization:*

δ_{1}(i)
= p(_{i} b_{i}o(1)) |

ψ_{1}(i) = 0 , i =1,
... , N |

According to the above definition, *β*_{T}(*i*)
does not exist. This is a formal extension of the below recursion to .

*Recursion:*

δ_{t }( j)
= max [_{ i }δ_{t - 1}(i) a_{ij}]
b(_{ j }o(t)) |

ψ_{t}( j) = arg max [_{ i }δ_{t - 1}(i) a_{ij}] |

*Termination:*

p = max^{*} [_{ i }δ(
_{T}i )] |

q^{*} = arg max_{T} [_{ i }δ(
_{T}i )] |

*Path (state sequence) backtracking:*

*q*^{*}* _{t}* =

Let us define ξ_{ t}(*i*, *j*), the joint
probability of being in state *q _{ i}* at time

ξ_{ t}(*i*, *j*) = *P*(*q*(*t*) =
*q _{ i}*,

The figure below illustrates the calculation of ξ_{ t}(*i*, *j*).

Joint probability paths.

Therefore we get

The probability of output sequence can be expressed as

The probability of being in state *q _{ i}* at time

Initial probabilities:

Transition probabilities:

Emission probabilities:

In the above equation Σ^{ *} denotes the sum over *t* so
that *o*(*t*) = *o _{k}*.

In progress ...

Forward abd Backward Algorithms , Viterbi Algorithm ,
Posterior decoding, and Baum-Welch Algorithm is available **here **(Delphi code
- uHMM.pas)
or **here** Forward Backward and Viterbi
Algorithm , Posterior decoding (C++ code - HMM.h and HMM.cpp). The code is not optimized
(for educational purpose only).

Kamen Kelchev ( kkelchev@bulsyst.com ) refactored uHMM.pas and made a testing program for the Baum-Welch Algorithm. The code is available in the download section below.

- Hidden Markov model, http://www.nist.gov/dads/HTML/hiddenMarkovModel.html
- Hidden Markov model, http://en.wikipedia.org/wiki/Hidden_Markov_model
- Lawrence R. Rabiner,
A Tutorial on Hidden Markov Models and Selected Applications in Speech
Recognition.
*Proceedings of the IEEE*, 77 (2), p. 257286, February 1989. - V. Petrushin. Hidden Markov Models: Fundamentals and Applications. Part 2: Discrete and Continuous Hidden Markov Models, Online Symposium for Electronics Engineers. 2000
- Narada Warakagoda, Baum-Welch Algorithm. http://jedlik.phy.bme.hu/~gerjanos/HMM/node11.html

**Delphi 7 source code**(including two demos) - HMM.zip ~ 9 KB. It requires PasMatLib.**TestHMM.zip**Demo exe: Forward, Backward and Viterbi Algorithms, Posterior decoding ~ 200 KB.**C++ source code**(including demo for C++ Builder 6) - HMMcpp.zip ~ 8KB. It requires CppMatLib.

Algorithms |

ŠNikolai Shokhirev, 2001-2010