Jump to content

Talk:Viterbi algorithm

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 24.202.146.187 (talk) at 02:03, 27 February 2016 (Pseudo-Code seems wrong: new section). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

WikiProject iconRobotics C‑class Mid‑importance
WikiProject iconThis article is within the scope of WikiProject Robotics, a collaborative effort to improve the coverage of Robotics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
CThis article has been rated as C-class on Wikipedia's content assessment scale.
MidThis article has been rated as Mid-importance on the project's importance scale.
Note icon
This article has been marked as needing immediate attention.

Extensions: undo weight?

The extensions are more like domain specific implementations, with only two methods suggested. Since many machine learning related grad programs require students to hack together Viterbi program for a sample problem, the selection of just a few papers doesn't seem justfied. Furthermore, judging from the number of citations, these are not seminal papers on machine learning. Disclaimer: I am not an author or any Viterbi paper, I declare no competing interests. — Preceding unsigned comment added by 24.5.84.124 (talk) 05:11, 29 December 2012 (UTC)[reply]

Trellis Diagram

The Trellis Diagram doesn't make sense to me, as there are multiple ways of reaching the second and third columns of states. Thus, each of the different pathways to reaching those states would have a different possibility, meaning that there should be more than one probability (as shown in the current diagram) associated with the second and third columns of states.

For example, for the second column of states, the following paths are possible to get to the 'Sunny' state:

Rain -> Sunny = 0.06 * 0.3 * 0.3 = 0.0054

Sunny -> Sunny = 0.24 * 0.6 * 0.3 = 0.0432

But only the last value is shown in the article's diagram. Do I misunderstand the Trellis diagram or is it missing a lot of the values for possible paths that a person requires before knowing which total path would have the best probability? If I'm not mistaken then either the diagram requires more details, or the problem requires multiple diagrams to fully show all possibilities. 146.232.75.208 (talk) 13:11, 19 November 2010 (UTC)[reply]

Well... or the probabilities shown on the nodes/states that aren't bold should not be displayed. 146.232.75.208 (talk) 13:17, 19 November 2010 (UTC)[reply]

The result of the Java code is not what it is described in the article

I executed the code, and I get:

0.033612005
Sunny,Rainy,Rainy,Rainy
0.009408001

Why 4 days and not 3 ? The numbers are also different.

Worked through by hand

I worked the example shown through by hand to help me understand how the algorithm works. I also did a brute force evaluation of the forward probability to verify everything was working correctly. Perhaps the notes maybe of use to someone:

http://sgenomics.org/mediawiki/index.php/HMM_Code

78.33.152.88 (talk) 12:34, 31 May 2009 (UTC)[reply]

Mathematical description

This page should have a description of the algorithm using mathematical formulas, as well. Already I have found two different descriptions of Viterbi and would like Wikipedia to clear things out.

Also Python may be a nice language, but using variable names like "p" or "v_prob" doesn't really explain the algorithm!

Strongly agree! I entered this page to find you have stated what I wanted to say.--Puekai 09:29, 6 March 2007 (UTC)[reply]

Yes I can describe the algorithm in simplified terms

I'm haveing a hard time to figure out where to put the description. There is nothing here that is not correct but it is a little confusing because of the usage of "hidden" and "events" without explaining. The algorithm itself is quite simple but I have a little problem with how it is described. I don't want to exorcise a lot of basically correct text, only to clarify in a more simple manner. I was thinking of adding a section just under the table of contents as an introduction, but being a newbie, I would need some lessons on how to do this.

I can also describe Trellis modulation. --User:jlpayton 9-sept-2005

.

Can someone show the algorithm here? Kieff | Talk 07:33, Oct 15, 2004 (UTC)

I'll post an example illustrating the forward algorithm and the Viterbi algorithm (those are really the same algorithm). --MarkSweep 21:33, 15 Oct 2004 (UTC)

Algorithm Please

Python maybe a formally defined language but it isn't a clear language. This page should clearly provide an example of the trellis, dynamic programming, the recursion, the loop invariant. Now this dynamically typed tripe. contributed by 24.69.52.124 a.k.a. S01060060676712c2.gv.shawcable.net

On the contrary, Python is a very clear and concise language. But let's not get into a holy war about programming language preferences. You can cut and past the Python code and run it directly. That should give you a good idea of what's going on. Thank you for your suggestion! When you feel an article needs improvement, please feel free to make whatever changes you feel are needed. Wikipedia is a wiki, so anyone can edit almost any article by simply following the Edit this page link at the top. You don't even need to log in! (Although there are some reasons why you might like to…) The Wikipedia community encourages you to be bold. Don't worry too much about making honest mistakes—they're likely to be found and corrected quickly. If you're not sure how editing works, check out how to edit a page, or use the sandbox to try out your editing skills. New contributors are always welcome. --MarkSweep 19:15, 21 Apr 2005 (UTC)
Sorry, I must admit that I agree with the original poster. I know the Viterbi algorithm, but was completely lost at the example given and then code. My suggestion would be: Drop the "rainy" and "sunny" story (it's utterly confusing), concentrate on some easy system instead (say, something with a transfer function of 1 + 0.5z^-1), and show a graphical description of the trellis and survivor paths. An encyclopedia shouldn't require people to cut-and-paste code to understand what's going on. 80.202.213.120
Is this a joke? I found the rainy and sunny story to be very easy to follow and a great example. Although the page probably could do with a proper mathematical definition to complement the easy explanation I found this to be very refreshing compared to other comp.sci./math articles around here. —Preceding unsigned comment added by 84.166.155.142 (talk) 21:22, 23 November 2007 (UTC)[reply]
+1 agree completely -- easy to follow and a great example. - Francis Tyers · 08:47, 6 October 2008 (UTC)[reply]
It is definitely not a joke. -Sesse (talk) 02:57, 29 January 2008 (UTC)[reply]
I agree the code isn't as clear as equations would be. I've added a small section with the recurrence relations. Hopefully someone can expand it further. 128.237.249.50 (talk) 22:23, 2 May 2010 (UTC)[reply]


Does the python code have a missing line? It looks like there's a hanging "print" with no output in the print_dptable function. I personally found the python code quite helpful.

Rlrogers (talk) 22:47, 12 October 2011 (UTC)[reply]

What exactly is the relation between Trellis modulation and the Viterbi algorithm ? --DavidCary 30 June 2005 19:32 (UTC)

Can't totally remember clearly but you should investigate Factor Graphs and see how the viterbi can be done with a factor graph --ReptileLawyer 18:13, 24 April 2006 (UTC)[reply]

Complexity?

Anyone know what the complexity of Viterbi is? I skimmed over it and it looks like it's O(s*n^2), where s is the size of the sequence it's given, and n is the number of possible states. Any thoughts? --aciel 20:41, 22 November 2005 (UTC)[reply]

That's correct. Depending on the data structures used to represent the underlying HMM, the Viterbi/Forward algorithm can be made to run in time, where s is the length of the sequence, n is the number of hidden states, and d is the average out-degree of each state. This is because you only need to explore pairs of states that are actually connected by an edge. In a fully connected HMM, the running time is . --MarkSweep (call me collect) 22:11, 23 November 2005 (UTC)[reply]

Is the python correct?

No, the algorithm is not correct.

The computation of the probability of the output(x) given the optimal Viterbi path (pi), P(x|pi) algorithm consists of

 P(x|pi) = 1 starting probability * n emission probabilities * (n-1) transition probabilities

The displayed algorithm computes

 P(x|pi) = 1 starting probability * n emission probabilities * n transition probabilities.

There is an obvious off by 1 error.


—Preceding unsigned comment added by 199.94.29.210 (talk) 20:00, 8 May 2009 (UTC)[reply]



The python contains the line p = ep[state][output] * tp[state][next_state]

which in my understanding should be p = ep[next_state][output] * tp[state][next_state]


I assume the first state returned by the example is the initial state and the second state is the state that matches the first observation

The example returns ['Sunny', 'Rainy', 'Rainy', 'Rainy'] and after modification it returns ['Sunny', 'Sunny', 'Rainy', 'Rainy']

which would seam better given the example (observation 1 = walk)

Am I mistaken? if so a little help on the 4 returned states is welcome.

The code is correct as is. It's written the way it is to simplify the main loop and to allow an empty sequence of observations. For example, here's what happens when you run the code on an empty observation sequence:
 >>> forward_viterbi([], ['q','r'], {'q':0.8, 'r':0.2}, {}, {})
 (1.0, ['q'], 0.80000000000000004)
 >>> 
This return value is the triple (1.0, ['q'], 0.8), which indicates the following: The probability of the observation sequence conditional on its length is 1.0 (that's obviously correct, since there is exactly one sequence with length zero); the Viterbi path consists of a single edge, leading from the (unnamed) start state to the state 'q'; and the probability of the Viterbi path is 0.8. The Viterbi path as returned by this code contains one more state, namely the state that is reached after leaving the state corresponding to the last observation (in this case, the state reached from the start state with probability 0.8). That's because emissions are associated with outgoing edges, rather than incoming edges (you may have seen a presentation of this algorithm that assumes the latter). Both versions of HMMs exist, in addition to a third one where emissions are properly associated with edges, so there are at least three common variants of the forward algorithm and the Viterbi algorithm. --MarkSweep (call me collect) 17:23, 15 December 2005 (UTC)[reply]
I (a different user) still don't understand. I haven't tested the change on the Python, but on my haskell version, both ways seem to work for the no-observation case. --196.210.100.237 (talk) 20:27, 11 February 2008 (UTC)[reply]
Running the algorithm by hand seems to indicate that the proposed correction to the algorithm is correct....I don't think this algorithm is correct.--151.201.108.130 (talk) 18:40, 6 May 2008 (UTC)[reply]
Really? I've never seen emissions associated with edges at all, only with nodes. That is certainly the case in the Hidden Markov model page here and in, e.g. the Rabiner reference used on that page, or in the Viterbi paper itself. If the Viterbi path is the path of states corresponding to the sequence of emissions, then there should be one state for each emission, not an extra state at the end.24.91.117.221 (talk) 14:32, 7 July 2008 (UTC)[reply]
The algorithm is incorrect indeed for another interpretation of what we call 'Viterbi path', following its direct definition as 'most likely explanation for an observation'. If we don't care about 'post-observation' state, then it must end in a state, for which we get last observation. The empty example result should be (1.0, [], 1.0) then. In that case the Viterbi path (without 'post-observation' state) cannot be obtained by current algorithm implementation by cutting last state (as soon we may loose Viterbi pathes to the penultimate states)! 85.141.70.130 (talk) 10:19, 4 April 2010 (UTC)[reply]

It seems that the correction above has been incorporated into the main page, but the output after modification ['Sunny', 'Sunny', 'Rainy', 'Rainy'] has not. It seems that either version of the algorithm should work. In the original version (using emission probability of the source state) ['Sunny', 'Sunny'] would be the first 2 states in the Viterbi path for the sunny state after one iteration of the "for output in obs" loop. The first three states of the vp for the rainy state would be ['Sunny', 'Rainy', 'Rainy'] after iteration 2, as the greater correlation between shopping and rain, combined with the higher transition probability of rain to rain vs. sun to sun would move this triple ahead of ['Sunny', 'Sunny', 'Sunny'] (note only the first two observations are known). ['Sunny', 'Rainy', 'Rainy', 'Rainy'] should be the correct output. In the second case (emission probability of the next state is used), the first iteration's Viterbi path for the sunny state would be ['Sunny', 'Sunny']. The second iteration's vp for sunny would then be ['Sunny', 'Sunny', 'Sunny'], and the final vp would be ['Sunny', 'Sunny', 'Rainy', 'Rainy']. However, I am not sure why it is necessary to keep these extra start or end states, as it seems that the length of the hidden state sequence should be the same as that of the observation sequence. Perhaps by using the revised (second) algorithm, we could obtain ['Sunny', 'Rainy', 'Rainy']. If we initialize prob to 1, v_path to [] and v_prob to 1 in T; and, if we are on the first "output" in the obs array, ignore trans_p in the p computation, it seems that we should obtain the correct three states. (update) Correction to above: it seems that: 1. the above suggestion to ignore tran_p in the case of the first observation works only in the case where emit_p is for the next_state. 2. The T array should be initialized to (start_probability, [], start_probability), it seems, so that the initial probabilities are a factor. 3. algorithm would be basically:

 //T initialized to (start_prob, [], start_prob) for rainy and
 //sunny above
 count = 0
 for output in obs:
   U = {}
   for nextstate in states:
     total = 0 
     argmax = none
     valmax = 0
     if (count==0)
        (prob, v_path, v_prob) = T[next_state]
         p = emit_p[next_state][output]
         prob*=p
         v_prob*=p
         argmax = v_path + next_state
         valmax = v_prob
      else
         for source_state in states:
              . . .
      //note that 
     //p=emit_p[next_state][output]*trans_p[source_state][next_state] 
      }   
      U[next_state]=(total, argmax, valmax)
   T=U
   count++
 //apply sum/max to final states as before below.

4. Note that a similar modification could occur if emit_p is calculated for the source_state, but in this case the special condition would be when the count is at size(obs)-1 (ie the last observation). 76.241.101.61 (talk) 18:42, 27 September 2008 (UTC) (75.10.146.25 (talk) 04:28, 28 August 2008 (UTC))[reply]

I've replaced the code with a more correct and much simpler implementation. There's no forward algorithm as I felt it overcomplicated the code and also there's now a separate page for that algorithm. 128.237.249.50 (talk) 20:32, 2 May 2010 (UTC)[reply]

SOVA

I intend to add a section on soft output version of the VA....give me some time and let the link stay!!!

Pizzadeliveryboy 02:47, 19 January 2006 (UTC)[reply]

Algorithm seems wrong

I tried to apply the forward algorithm on the example given in the article, but I do not find the same result. Briefly, here is the forward algo I applied (from rabiner paper): Let say an HMM system with the transitional matrix, the emission matrix, an observation sequence and is the porbability of observing and having as a state for the HMM at time so we have:

The algo is like:

1- Initialization:

with the start probability of state i

2- Deduction:

3-Terminaison

Applying this algo with this code in python:

def compute_alpha(pi, A, B, O, alpha, t):
  if t == 0:
    for state in pi:
      alpha[0][state] = pi[state] * B[state][O[0]]
  else:
    compute_alpha(pi, A, B, O, alpha, t - 1)
    for state_j in pi:
      temp = 0
      for state_i in pi:
        temp += alpha[t - 1][state_i] * A[state_i][state_j]
      alpha[t][state_j] = temp * B[state_j][O[t]]

def evaluate(pi, A, B, O):
  """Evaluate a suite of states regarding a MM model"""
  alpha = dict()
  alpha[0] = dict()
  for i, state in enumerate(pi):
    alpha[i+1] = dict()
  T = len(O) - 1
  compute_alpha(pi, A, B, O, alpha, T)
  max = 0
  for state in alpha[T]:
    if max <= alpha[T][state]:
      max = alpha[T][state]
  print alpha
  return max

I got 0.02904 as a proba for the observation {'walk', 'shop', 'clean'} given in the example. So which algo is wrong? I can give you the step of calculation, they are quite simple for such a small example.JeDi 05:14, 27 July 2006 (UTC)[reply]

Your algorithm failed to select the survivor path for each new symbol. I suggest using

def compute_alpha(pi, A, B, O, alpha, t):
 if t == 0:
   for state in pi:
     alpha[0][state] = pi[state] * B[state][O[0]]
 else:
   compute_alpha(pi, A, B, O, alpha, t - 1)
   for state_j in pi:
     temp = 0
     maxval = 0
     for state_i in pi:
     if alpha[t - 1][state_i] * A[state_i][state_j] > maxval:
         maxval = alpha[t - 1][state_i] * A[state_i][state_j]
     alpha[t][state_j] = maxval* B[state_j][O[t]]

Using this modified code suggests that the Viterbi path is Sunny, Rainy, Rainy, with probability 0.01344. This is the same value I get when doing all of the computations by hand (following Viterbi's paper, not your math or the article here). It corresponds to the product 0.4*0.6*0.4*0.4*0.7*0.5. This is the product of the probabilities of beginning in Sunny, emitting walk, transitioning to Rainy, emitting shop, transitioning to Rainy, finally emitting clean. If this is correct, the article will need to be changed. (Though your algorithm is misleading in that it prints, on each step, the currently most likely state, which will not necessarily end up on the final survivor path.)24.91.117.221 (talk) 14:46, 7 July 2008 (UTC)[reply]

A concrete example is partially wrong

From the transition matrix one can observe initial state probabilities, namely for given transition matrix probability of rainy day is 4/7, while probability of sunny day is 3/7. It appears that values in the example are rounded. 08:20, 28 December 2005 (UTC)

logs, comments

I realize it would make the code more complicated, but the implementation might be misleading because you really want to use logs of the probabilities to avoid underflows. I think the names of variables could be improved.

I think the output should have the same number of states as the number of observations. The observations are usually associated with states, not transitions between states.

The code (as is), works fine for the given example. However, for anything but trivial examples it is likely to fail. By using logs, this can be avoided (just a hint really).

Markov vs. probabilities

How does the Markov assumption of non-dependence on history exist together with the state transition probabilities being essential for the algorithm? Can that aspect be made clearer? 130.126.180.235 04:01, 30 May 2007 (UTC)BillBusen[reply]

The Markov assumption simply means non-dependence on history BEYOND one state back, so state transition probabilities which encompass a one-state history are permissible. I hope that clears up any confusion. --Tekhnofiend (talk) 17:43, 27 April 2008 (UTC)[reply]

concrete example, trellis diagrams

I'm somewhat disappointed by the section called "concrete example".

Sorely missing from this article are pictures of trellis diagrams, and an in-depth comparison to the space/time performance of the forward algorithm, and in particular to the forward-backward algorithm. It's critical (I think) to note that the Viterbi algo drops many of the links that the other algorithms keep, in exchange for an O(N) run time (as opposed to O(N^2) or whatever), and only a minor loss of entropy.

Certainly, I was utterly unable to understand Viterbi, until I saw it layed out in comparison to the other algos, at which point, things clicked together. linas (talk) 21:06, 10 June 2008 (UTC)[reply]

numerical example

I thought this might be useful. Using the data given in the article, here is a suggested numerical example of the algorithm. I'm rather new to the algorithm, so I'm posting it here to get comments before it goes in the main article. It is possible that I have misunderstood the algorithm and that this is incorrect. It is also possible that the algorithm as now used and as described in the article is different than that of the Viterbi paper on error bounds for convolutional codes.

What is the most probable path (and its probability) corresponding to the sequence of observations ['walk', 'shop', 'clean']? We first find the state most likely to have emitted walk, taking into account the probability of the system being in that state at all. In this case, the Rainy state is initially occupied with probability 0.6 and Rainy emits 'walk' with probability 0.1, so the probability of the actual sequence beginning with Rainy is 0.6*0.1 = 0.06. Similarly, the probability of the sequence beginning with Sunny is 0.4*0.6 = 0.24, because there is a probability of 0.4 of beginning in Sunny and Sunny emits 'walk' with probability 0.6. As there is only one path ending in Sunny and one path ending in Rainy, each becomes the survivor path (one state long) of this first step.

So it is expected the system is in the state Rainy with probability 0.06. For each state that can follow Rainy, what is the probability of that state emitting the next observation in the sequence: 'shop'? It is the probability that the system was in Rainy at all, 0.06, times the probability of Rainy transitioning to Rainy, 0.7, times the probability of Rainy emitting 'shop', 0.4. The probability that the first two states in the sequence are Rainy and Rainy is 0.06*0.7*0.4 = 0.0168. In the same way, we find the probability of the first two states being Sunny then Rainy is 0.0384. These are the only two paths of two states that can end in Rainy. Only the more likely one becomes the survivor path to be used in the next step. In this case, it the path of Sunny then Rainy. Similarly, we look at the two paths that end in Sunny, one beginning with Rainy and one with Sunny. Multiplying the appropriate probabilities shows that Sunny then Sunny is the most likely path that ends with Sunny, with probability 0.0432.

Specifically, the calculations are

Rainy->Rainy: 0.06*0.7*0.4 = 0.0168
Sunny->Rainy: 0.24*0.4*0.4 = 0.0384 "survivor"
Rainy->Sunny: 0.06*0.3*0.3 = 0.0054
Sunny->Sunny: 0.24*0.6*0.3 = 0.0432 "survivor"

That step is repeated on the two survivor paths: Sunny then Rainy and Sunny then Sunny. The four new calculations are

Sunny->Rainy->Rainy: 0.384*0.4*0.6 = 0.01344
Sunny->Sunny->Rainy: 0.0432*0.4*0.5 = 0.00864
Sunny->Rainy->Sunny: 0.384*0.4*0.6 = 0.001152
Sunny->Sunny->Sunny: 0.0432*0.6*0.1 = 0.002592

Of these, the most probable path that would emit the specified observations is Sunny, Rainy, Rainy, with probability 0.01344. —Preceding unsigned comment added by 24.91.117.221 (talk) 15:13, 7 July 2008 (UTC)[reply]

Simplified Algorithm

Hi, I understand python, but I feel pseudo code is more applicable. Something which shows the idea, not the physical implementation is more useful in a pedagogical sense. I just read the article, and I suggest (hope someone can improve this basic pseudocode)

Algorithm: Viterbi algorithm

 inputs: 
   probability of starting states: pStart[state], 
   probability of transition between states: pTrans[source][destination], 
   probability of emission given the current state: pEmit[source][symbol],
   observed emissions at each interval: emission[1...n]
 outputs:
   most likely series of events: likelyEvents[1...n]
   probability of the observed emissions: p
 method:
   let probability of being in state s up to time t be: p[s][t]
   let most likely path that ends in state s up to time t be: path[s][t]
   let probability of having taken path[s][t] up to time t: pPath[s][t]
   initialize p[s][0] = pStart[s]
   initialize path[s][0] = (s)
   initialize pPath[s][0] = 1
   at each time period t 
     let s* = state s' for which p[s'][t-1]*pTrans[s'][s]*pEmit[s'][emission[t-1]] is maximized
     p[s][t] = sum over all states s' of p[s'][t-1]*pTrans[s'][s]*pEmit[s'][emission[t-1]]
     path[s][t] = append s to path[s*][t-1]
     pPath[s][t] = pPath[s*][t-1]*p[s'][t-1]*pTrans[s'][s]*pEmit[s'][emission[t-1]] is maximized
   let s* = state s' for which pPath[s'][n] is the biggest
 return:
   likelyEvents = path[s'][n]
   p = sum over all states s' of p[s'][n]

Optimize this for the layman

I think the above proposal about pseudocode is on the wrong track: Remove the code altogether, and replace it with a description, in English (not a programming language or pseudocode), of the thinking Alice has to do to figure out the weather where Bob is. Most people have never taken a programming class, or even a logic class; we should be putting this in terms that anybody can understand.--Aervanath (talk) 04:54, 2 February 2009 (UTC)[reply]

A possibly helpful version of the Viterbi algorithm in Java (placed here to avoid poluting the main article with tons of code)

After a fair amount of effort I produced this Java version of the Viterbi algorithm based on the version shown in this article. I've attempted to rename the variables so they will be easy for the novice to understand. I've tested it and it appears to work, but if any bugs are discovered it would be great to have them noted. As pointed out by another user, the state path produced is one state too long; the final state is bogus. I've elected to leave it intact here, but this issue can be resolved by removing the last state just before returning (however the state must be used during computations or invalid results are obtained). Note that the SparseMatrix class is not included, but it can be trivially replaced with a two-dimensional array of doubles. Using the SparseMatrix implementation decreases memory usage in one of my real world "part of speech" tests by a factor of 75.

   protected class StatePathOdds {
       public double  m_odds;
       public ArrayList<Integer>  m_path;
       public double  m_pathOdds;
       //
       public StatePathOdds (
               double              odds,
               ArrayList<Integer>  path,
               double              pathOdds) {
           m_odds = odds;
           m_path = new ArrayList<Integer> (path);
           m_pathOdds = pathOdds;
       }// end StatePathOdds
       //
       public StatePathOdds (
               StatePathOdds       statePathOdds) {
           this (statePathOdds.m_odds, statePathOdds.m_path, statePathOdds.m_pathOdds);
       }// end StatePathOdds
   }// end class StatePathOdds


   /** Determine the most likely hidden model states for the list of outputs passed in using the Viterbi algorithm.
    @param outputs An array of Strings that should be analyzed to find the best path of states.
    @param states The different states (represented as int ids that can be mapped to Strings externally).
    @param initialStateOdds The probability of each state occurring.
    @param transitionOdds Probability of transitioning from one state to another (should be states.length
    by states.length in size).
    @param stateToOutputOdds Probability of the state machine generating a particular output when in a certain
    state (should be states.length by outputs.length in size).
    @param outputToIdNumberMap A map from output Strings to id numbers used to reference them.  The mapping should match that
    used to create the stateToOutputOdds matrix.
    @return An ArrayList of Integers that indicates the state path for each output in the outputs array.  The
    Integers indicate an index in the states array for each output in the outputs array. */
   public ArrayList<Integer> findBestStatePathViaViterbi (
           String[]            outputs,
           String[]            states,
           double[]            initialStateOdds,
           double[][]          transitionOdds,
           SparseMatrix<Double> stateToOutputOdds,
           HashMap<String, Integer> outputToIdNumberMap) {
       StatePathOdds[]  statePathOdds = new StatePathOdds[states.length];
       for (int stateIndex = 0; stateIndex < states.length; stateIndex++) {
           statePathOdds[stateIndex] = new StatePathOdds (initialStateOdds[stateIndex],
                   new ArrayList<Integer> (Arrays.asList (new Integer [] { stateIndex })), initialStateOdds[stateIndex]);
       }//endfor
       for (String output : outputs) {
           StatePathOdds[]  nextStatePathOdds = new StatePathOdds[states.length];
           for (int nextStateIndex = 0; nextStateIndex < states.length; nextStateIndex++) {
               double  totalOutputOdds = 0;
               ArrayList<Integer>  pathOfBestOdds = new ArrayList<Integer> ();
               double  bestPathOdds = 0;
               for (int stateIndex = 0; stateIndex < states.length; stateIndex++) {
                   StatePathOdds  currentStatePathOdds = new StatePathOdds (statePathOdds[stateIndex]);
                   int  outputId = outputToIdNumberMap.get (output);
                   double  outputOdds = stateToOutputOdds.get (stateIndex, outputId) * transitionOdds[stateIndex][nextStateIndex];
                   currentStatePathOdds.m_odds *= outputOdds;
                   currentStatePathOdds.m_pathOdds *= outputOdds;
                   totalOutputOdds += currentStatePathOdds.m_odds;
                   if (currentStatePathOdds.m_pathOdds > bestPathOdds) {
                       pathOfBestOdds = new ArrayList<Integer> (currentStatePathOdds.m_path);
                       pathOfBestOdds.add (nextStateIndex);
                       bestPathOdds = currentStatePathOdds.m_pathOdds;
                   }//endif
               }//endfor
               nextStatePathOdds[nextStateIndex] = new StatePathOdds (totalOutputOdds, pathOfBestOdds, bestPathOdds);
           }//endfor
           statePathOdds = nextStatePathOdds;
       }//endfor
       int  indexOfBestPath = 0;
       for (int statePathOddsIndex = 0; statePathOddsIndex < statePathOdds.length; statePathOddsIndex++) {
           if (statePathOdds[statePathOddsIndex].m_pathOdds > statePathOdds[indexOfBestPath].m_pathOdds) {
               indexOfBestPath = statePathOddsIndex;
           }//endif
       }//endfor
       return statePathOdds[indexOfBestPath].m_path;
   }// end findBestStatePathViaViterbi

Twikir (talk) 14:01, 14 June 2009 (UTC)[reply]

Viterbi vs the Forward Algorithm

I feel compelled to point out (or rather, shout) that the Viterbi algorithm and the Forward algorithm are NOT the same algorithms (Why does Forward algorithm point here??). They do not have the same calculations, and they absolutely do not produce the same results. They are related algorithms, but not the same. In a nutshell, Viterbi computes the most likely sequence of states, whereas the Forward algorithm computes the single most likely state at a given time. Using the Forward algorithm to produce the most likely state at each time step does not necessarily produce the same state sequence that Viterbi does. Indeed, running full filtering/smoothing with the Forward/Backward algorithm does not necessarily produce the same state sequence as Viterbi.

This may seem like quibbling, but mathematically, they are very different things.

I already detailed some of this in the talk page on the Talk:Forward algorithm... because I was sad. Sad that Wikipedia was so confused. Avidelego (talk) 19:18, 14 April 2010 (UTC)[reply]

It seems someone listened as there's now a separate page for it. I've replaced the code with something I think is better, but I still think the state of this page is abysmal. 128.237.249.50 (talk) 20:35, 2 May 2010 (UTC)[reply]

Too much code

The main page has a lot of code, and I think this really dilutes the content. As an article reader, I don't want to read through code, no matter how well written it is. Also, the weather/activity example is good, but it leaves it to code to explain the probabilities -- this should be in a table (or, maybe the diagram is sufficient). The trellis diagram is worth a thousand lines of code -- nice job on this. Can't we put the code on the talk page or something? Lavaka (talk) 18:11, 18 November 2010 (UTC)[reply]

I join in the chant, the Viterbi algorithm is a masterpiece yet the article can't stand up its fame: to the unexperienced reader, the code is not the only thing of interest and working out the example a little more (i.e. showing the score calculations and the core idea of what path should be picked in the graph) would be great. --Reim (talk) 17:03, 14 April 2011 (UTC)[reply]
I think that some algorithmic walk-through is pretty helpful, which to me means the python example should stay - it's quite readable. Not clear why an encyclopedia article should have a second implementation (i.e. the java one) pasted in the middle of it too... I'd like to see Wikipedia or Wikimedia more generally making this kind of basic code available, but is there a better place it cna be moved? A wikibook? --mcld (talk) 12:32, 2 December 2011 (UTC)[reply]

OH GOD NOW THERE'S FAR TOO MUCH CODE. Someone just added an F# implementation, as well as the existing Java, Clojure, C#, oh help me please. These programming languages are each lovely but there's just no way a reader wants to scroll past many different implementations in an ENCYCLOPEDIA article. I'm going to Be Bold and delete some of it. --mcld (talk) 20:48, 3 May 2012 (UTC)[reply]

Seems like it was the right change :-) When I read the article as it is now, I thought it very well written and understandable, with just the right amount of code (one Python implementation). The "Pseudocode" section however seemed very dense (and possibly superfluous, though perhaps non-coder mathematicians find it useful), while the "Extensions" section seemed a bit short. Unhammer (talk) 08:48, 12 November 2014 (UTC)[reply]

Numbers again

Like so many before, I looked for a simple as possible explanation of the algorithm. But my eye was caught by the figure

  • .0134

This is clearly not the probability of the most likely set of weather conditions given Walk-Shop-Clean, since there are only 8 conditions the most likely needs to be > 0.125.

I guessed (correctly as it turned out) that this is the probability ab initio of Sunny-rainy-rainy(SRR) AND Walk-Shop-Clean(WSC) - and not something that we need an algorithm for.

The probability of SRR given WSC is actually 18.91%, computable by Bayes theorem. What the Viterbi algorithm does, presumably is allow us to derive the most probable state without computing the probabilities for all states, otherwise it would be just brute force. That is what should be made clear in the article. Rich Farmbrough, 14:30, 8 February 2011 (UTC).[reply]


Assumption

"Third, computing the most likely hidden sequence up to a certain point t must depend only on the observed event at point t, and the most likely sequence at point t − 1."

This does not appear a valid assumption.

Example:

Initial probability

πX= 0.99, πA= 0.01.

Transitions

state transition P emits
A A .9999999999 1
A B .000000000099999999999 0
A C .000000000000000000001 17
B A 1 1
C A 1 0
X Y 1 0
Y X 1 1

Now we see emitted 0,1,0,1,0,1 the most likely path is X,Y,X,Y,X,Y,X. However if the next value emitted is 17, the most likely (and only) path is ABABABAC.

Rich Farmbrough, 16:03, 8 February 2011 (UTC).[reply]
Your example is correct. At step 6, the Viterbi algorithm would remember the path XYXYXY as the most likely way to get to Y, and ABABAB as the most likely way to get to B. If you ask it for the "viterbi path" at that point you'll get the former. At step 7 the paths are updated. If you then ask for the "viterbi path" you get one based on the latter. Maybe the phrase "and the most likely sequence at point t − 1" should be "and the most likely sequence leading to each possible state at point t − 1"? Bit clunky? --mcld (talk) 12:41, 2 December 2011 (UTC)[reply]

Corresponds to time

I don't understand the following sentence: "This sequence often corresponds to time." Is it a function of time, or only time-dependent, or something else? JMK (talk) 14:53, 30 March 2011 (UTC)[reply]

The sequence is often strictly monotonic in time, i.e. time-ordered. Is that clearer? I don't think "monotonic" is a good word to put in the intro... --mcld (talk) 12:35, 2 December 2011 (UTC)[reply]

Robotics project attention needed

  • MoS checks
  • Refs - enough?
  • Content - all topics covered?
  • Reassess

Chaosdruid (talk) 11:16, 24 March 2012 (UTC)[reply]

Incorrect Result for Health Clinic Example

For the health clinic example, I get that the most likely sequence of states given observations { 'normal', 'cold', 'dizzy' } is { "Healthy', 'Healthy', 'Fever' }, not { "Healthy', 'Fever', 'Fever' } as the article presently reads. See table below for brute force calculatation.

States Prior 1 Trans 1-->2 Trans 2-->3 Obs 1 Obs 2 Obs 3 Total Likelihood
HHH 0.600000 0.700000 0.700000 0.500000 0.400000 0.100000 0.005880
HHF 0.600000 0.700000 0.300000 0.500000 0.400000 0.600000 0.015120
HFH 0.600000 0.300000 0.400000 0.500000 0.300000 0.100000 0.001080
HFF 0.600000 0.300000 0.600000 0.500000 0.300000 0.600000 0.009720
FHH 0.400000 0.400000 0.700000 0.100000 0.400000 0.100000 0.000448
FHF 0.400000 0.400000 0.300000 0.100000 0.400000 0.600000 0.001152
FFH 0.400000 0.600000 0.400000 0.100000 0.300000 0.100000 0.000288
FFF 0.400000 0.600000 0.600000 0.100000 0.300000 0.600000 0.002592

I believe I see how this error crept in. In its previous form, when the example was about predicting whether the weather was SUNNY or RAINY from Bob's activities, the answer was correct. The author that rewrote the example to be about determining whether a patient was HEALTHY or had a FEVER thought they were doing a simple label replacement so all the math would come out the same. However they actually swapped the observation distributions, changing the mathematical model and likewise the maximum likelihood state trajectory given the example observations. Ooops! Note that the "trellis" diagram is now wrong.

Elvenhack (talk) 06:48, 24 May 2012 (UTC)[reply]

Algorithm definition: State variable name changes from "i" to "k"?

At the beginning the state is defined with the variable "i" but then this variable doesn't occur in the formulas, only "k". Is this an error? 2001:638:902:200B:68B5:F304:56D6:2AC8 (talk) 12:53, 12 September 2012 (UTC)[reply]

I think that this might be intentional to keep readers from mixing up the two; where one part is the definition of the variables, the other, the definition of the algorithm. The way I see it "i" and "j" (neither of which appear again) refer to arbitrary states, simply for evaluating transition probabilities; whereas "k" denotes a state in a broader context, for which you are asking the probability of that particular state being the final emitting state after a certain number of time steps, given the observations. PrincessKhorde (talk) 09:01, 19 November 2012 (UTC)[reply]

Python code

I decided to remove the Python code because (a) this is an encyclopedia, not a source code repository, and (b) it was completely broken. Since software maintenance on Wikipedia is practically impossible (there are broken code examples in many articles), I suggest we leave it out and improve the pseudocode instead. Jurafsky and Martin, 2nd ed., have a nice pseudocode version on page 220. QVVERTYVS (hm?) 15:03, 17 July 2013 (UTC)[reply]

Please do not simply remove portions from an encyclopedic article simply because you feel like it. First of all, the Python implementation does work exactly as expected. It's possible that you have an older version of Python installed which fails to run it properly. Secondly, the Python implementation is illustrative of the algorithm and contributes to this article. Removal of the Python code is a step backwards. Please feel free to make contributions to Wikipedia articles, but do not impetuously remove content. --Alex Rosenberg (talk) 02:32, 21 July 2013 (UTC)[reply]

I fixed a couple of minor bugs in the Python code. Documented in the change description. Insignificant1 (talk) 23:53, 31 October 2013 (UTC)[reply]

The Python function does not seem make use of the `start_p` initial probabilities at all.Jpo742 (talk) 01:30, 12 February 2016 (UTC)[reply]

Pseudo-Code seems wrong

If you follow the logic of the pseudo-code it does not match the health clinic example.

From the state transition from state t=1 to t=2.

There should be path probabilities as follows:

H-H = 0.3*0.7*0.4 = 0.084 H-T = 0.3*0.3*0.3 = 0.027 T-H = 0.04*0.4*0.4 = 0.0064 T-T = 0.04*0.6*0.3 = 0.0072

The pseudo code states that:

T_1[j,i] <- max_k (T_1[k, i-1]*A_{k,j} * B_{j,yi})

I think it should be

T_1[j,1] <- max_k (T_1[j, i-1] * A_{k,j} * B_{j, yi})