Talk:Baum–Welch algorithm

From Wikipedia, the free encyclopedia
Jump to: navigation, search


Could Template:HMM example be put to use on this page? (Granted we might have to modify it a bit to make it more general) -- Kowey 14:16, 14 Nov 2004 (UTC)

Content Discussion[edit]

There is a document written by Welch on IEEE Information Theory Society Newslectter, December 2003. See

I scrutinized the description, and I also think it is wrong.

We have: gamma_t(i): the probability that an HMM is in state s_i at time t given the observations O and the parameters Lambda. (1)

gamma_t(i)=P(s_i=q_t | O, Lambda) (2)


          --------------------          (3)

(3) is the Forward-Backward probability of state s_t(i), the probability one is in state s_i at t given time t. So this is the role the Foward-Backward algorithm plays in B W.

Now, let's look at the probability of going through a TRANSITION from s_i at time t to s_j at time t+1. Let's call it "chi'

Chi(i,j) is the probability of being in state s_i at time t, and being in state s_j at time t+1 given the observations and the parameters. (4)

Chi(i,j)=P(s_i=q_t, s_j=q_t+1 | O, Lambda) (5)


          P(O|Lambda)      (6)

SIGMA Chi(i,j) is the expected number of

       transitions from s_i to s_j


SIGMA gamma_t(i) is the expected number of

       transitions from s_i

              SIGMA   Chi(i,j)

Updated a_i_j =------------------------

              SIGMA   gamma_t_(i)

It is the last step I am not getting. Certainly the enumerator is not constant, as the current article claims, nor is it equal to the probability of the entire string. I cannot see why updated a_i_j is greater than the old value.


Cutting, D., and Kupiec, J., Pedersen, and P Sibun "A Practical Part-of-Speech Tagger", Xerox Palo Alto Research Center.

ChengXiang Zhai, A Brief Note On The Hidden Markov Models, Lecture Notes, Dept. of CS, University of Illinois at Urbana-Champaign.


Koos van der Wilt Koos van der Wilt 00:52, 13 May 2007 (UTC)

Were you discussing the article or the Welch presentation? Which one do you feel wrong? I think the latter, please confirm. I've not read your contribution, but I remember several headaches when trying to make sense of this algorithm in Pattern Classification (2ed) by Richard Duda et al. (ISBN 0 471 05669 3); I was convinced the description was wrong, and I rememeber I thought it was difficult to fix it.

--Blaisorblade (talk) 03:09, 24 June 2008 (UTC)

Confusion over Forward-backward algorithm[edit]

I think the link is confusing. The current text of Forward-backward algorithm seems to describe a different algorithm, which uses a forward and a backward pass but combines them for other purposes (at least, it appears so to me). Also the Baum-Welch algorithm does such a forward and backward pass, so it is called forward-backward algorithm somewhere (for instance in Pattern Classification (2ed) by Richard Duda et al. (ISBN 0 471 05669 3)). --Blaisorblade (talk) 03:09, 24 June 2008 (UTC) —Preceding unsigned comment added by (talk) 03:22, 25 February 2010 (UTC)

Ambiguous indices[edit]

In the formula for the epsilon variables, i and j are both free indices, and used for summation. 2001:6B0:1:1DF0:C185:CDB5:7AF7:29BA (talk) 17:58, 19 October 2013 (UTC)

Confusing chicken and egg example[edit]

There is one chicken and we observe whether it lays eggs or not at a day. Why are the observations then like NN, NE, EE, EN, that is, why are there two observations each day? I don't get it. HenningThielemann (talk) 17:33, 25 January 2015 (UTC)

Working out the probability all day works up an appetite, wherefore the chicken is fried at suppertime. (Probability of laying an egg drops to zero.)


The description ends with "These steps are now repeated iteratively until a desired level of convergence." What steps? Although it would be nice to specify how convergence is measured/detected, I'll leave that aside for now. Isn't the algorithm meant to be evaluated for T=1, then T=2, T=3, and so on until a desired level of convergence is attained? If true, the text should reflect that (it currently does not, since the "steps" are not linked with T). Urhixidur (talk) 15:54, 24 October 2017 (UTC)

Another thing that is sadly missing from the article is the measure of fitness of any given solution. Training an HMM over various data sets (of the same phenomenon) with the same parameters (number of hidden states, basically), how does one pick the "best" one? Urhixidur (talk) 17:34, 24 October 2017 (UTC)