# Forward algorithm

Jump to: navigation, search
Not to be confused with Forward-backward algorithm.

The forward algorithm, in the context of a hidden Markov model, is used to calculate a 'belief state': the probability of a state at a certain time, given the history of evidence. The process is also known as filtering. The forward algorithm is closely related to, but distinct from, the Viterbi algorithm.

For an HMM such as this one:

this probability is written as $P(x_t | y_{1:t} )$. Here $x(t)$ is the hidden state which is abbreviated as $x_t$ and $y_{1:t}$ are the observations $1$ to $t$. A belief state can be calculated at each time step, but doing this does not, in a strict sense, produce the most likely state sequence, but rather the most likely state at each time step, given the previous history.

## Algorithm

The goal of the forward algorithm is to compute the joint probability $p(x_t,y_{1:t})$, where for notational convenience we have abbreviated $x(t)$ as $x_t$ and $(y(1), y(2), ..., y(t))$ as $y_{1:t}$. Computing $p(x_t,y_{1:t})$ directly would require marginalizing over all possible state sequences $\{x_{1:t-1}\}$, the number of which grows exponentially with $t$. Instead, the forward algorithm takes advantage of the conditional independence rules of the hidden Markov model (HMM) to perform the calculation recursively.

To demonstrate the recursion, let

$\alpha_t(x_t) = p(x_t,y_{1:t}) = \sum_{x_{t-1}}p(x_t,x_{t-1},y_{1:t})$.

Using the chain rule to expand $p(x_t,x_{t-1},y_{1:t})$, we can then write

$\alpha_t(x_t) = \sum_{x_{t-1}}p(y_t|x_t,x_{t-1},y_{1:t-1})p(x_t|x_{t-1},y_{1:t-1})p(x_{t-1},y_{1:t-1})$.

Because $y_t$ is conditionally independent of everything but $x_t$, and $x_t$ is conditionally independent of everything but $x_{t-1}$, this simplifies to

$\alpha_t(x_t) = p(y_t|x_t)\sum_{x_{t-1}}p(x_t|x_{t-1})\alpha_{t-1}(x_{t-1})$.

Thus, since $p(y_t|x_t)$ and $p(x_t|x_{t-1})$ are given by the model's emission distributions and transition probabilities, one can quickly calculate $\alpha_t(x_t)$ from $\alpha_{t-1}(x_{t-1})$ and avoid incurring exponential computation time.

The forward algorithm is easily modified to account for observations from variants of the hidden Markov model as well, such as the Markov jump linear system.

## Smoothing

In order to take into account future history (i.e., if one wanted to improve the estimate for past times), you can run the backward algorithm, which complements the forward algorithm. This is called smoothing.[why?] The forward/backward algorithm computes $P(x_k | y_{1:t} )$ for $1. So the full forward/backward algorithm takes into account all evidence.

## Decoding

In order to achieve the most likely sequence, the Viterbi algorithm is required. It computes the most likely state sequence given the history of observations, that is, the state sequence that maximizes $P(x_{0:t}|y_{0:t})$.

The difference between the state sequence that the Viterbi algorithm estimate generates and the state sequence that the forward algorithm generates is that the Viterbi algorithm recalculates the entire sequence with each new data point whereas the forward algorithm only appends the new current value to the previous sequence computed.

## Further reading

• Russell and Norvig's Artificial Intelligence, a Modern Approach, starting on page 541 of the 2003 edition, provides a succinct exposition of this and related topics