Forward algorithm

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Not to be confused with Forward-backward algorithm. ‹See Tfd›

The forward algorithm, in the context of a hidden Markov model, is used to calculate a 'belief state': the probability of a state at a certain time, given the history of evidence. The process is also known as filtering. The forward algorithm is closely related to, but distinct from, the Viterbi algorithm.

For an HMM such as this one:

Temporal evolution of a hidden Markov model

this probability is written as P(x_t | y_{1:t} ). Here x(t) is the hidden state which is abbreviated as x_t and y_{1:t} are the observations 1 to t. A belief state can be calculated at each time step, but doing this does not, in a strict sense, produce the most likely state sequence, but rather the most likely state at each time step, given the previous history.

Algorithm[edit]

The goal of the forward algorithm is to compute the joint probability p(x_t,y_{1:t}), where for notational convenience we have abbreviated x(t) as x_t and (y(1), y(2), ..., y(t)) as y_{1:t}. Computing p(x_t,y_{1:t}) directly would require marginalizing over all possible state sequences \{x_{1:t-1}\}, the number of which grows exponentially with t. Instead, the forward algorithm takes advantage of the conditional independence rules of the hidden Markov model (HMM) to perform the calculation recursively.

To demonstrate the recursion, let

\alpha_t(x_t) = p(x_t,y_{1:t}) = \sum_{x_{t-1}}p(x_t,x_{t-1},y_{1:t}).

Using the chain rule to expand p(x_t,x_{t-1},y_{1:t}), we can then write

\alpha_t(x_t) = \sum_{x_{t-1}}p(y_t|x_t,x_{t-1},y_{1:t-1})p(x_t|x_{t-1},y_{1:t-1})p(x_{t-1},y_{1:t-1}).

Because y_t is conditionally independent of everything but x_t, and x_t is conditionally independent of everything but x_{t-1}, this simplifies to

\alpha_t(x_t) = p(y_t|x_t)\sum_{x_{t-1}}p(x_t|x_{t-1})\alpha_{t-1}(x_{t-1}).

Thus, since p(y_t|x_t) and p(x_t|x_{t-1}) are given by the model's emission distributions and transition probabilities, one can quickly calculate \alpha_t(x_t) from \alpha_{t-1}(x_{t-1}) and avoid incurring exponential computation time.

The forward algorithm is easily modified to account for observations from variants of the hidden Markov model as well, such as the Markov jump linear system.

Smoothing[edit]

In order to take into account future history (i.e., if one wanted to improve the estimate for past times), you can run the Backward algorithm, a complement of the Forward. This is called smoothing. Mathematically, it would be said that the forward/backward algorithm computes P(x_k | y_{1:t} ) for 1<k<t. So the use of the full F/B algorithm takes into account all evidence.

Decoding[edit]

In order to achieve the most likely sequence, the Viterbi algorithm is required. It computes the most likely state sequence given the history of observations, that is, the state sequence that maximizes P(x_{0:t}|y_{0:t}).

The difference between the state sequence that the Viterbi algorithm estimate generates and the state sequence that the Forward algorithm generates is that the Viterbi algorithm recalculates the entire sequence with each new data point whereas the Forward Algorithm only appends the new current value to the previous sequence computed.

See also[edit]

Further reading[edit]

  • Russel and Norvig's Artificial Intelligence, a Modern Approach, starting on page 541 of the 2003 edition, provides a succinct exposition of this and related topics