› Forums › Automatic speech recognition › Hidden Markov Models (HMMs) › Baum-welch algorithm
- This topic has 4 replies, 3 voices, and was last updated 3 years, 7 months ago by Simon.
-
AuthorPosts
-
-
December 13, 2020 at 16:11 #13619
I understand that Viterbi training makes a hard alignment between observation and states whereas Baum welch uses a soft or probabilistic alignment.
However, I still don’t get why Baum-welch is needed when the the parameters in a state can still be updated by Viterbi algorithm though it would take high computational cost. Is it for efficiency? -
December 13, 2020 at 16:17 #13621
Baum-Welch does the correct computation. This is “by definition” – because the model has a hidden state sequence, the correct thing to do is integrate out that random variable = sum over all values it can take.
Baum-Welch provides a better estimate of the model parameters than Viterbi, in both a theoretical and empirical sense.
-
December 13, 2020 at 16:27 #13622
So Baum welch computes all possible State sequences through the model and weights the joint probability of each (by counting how many times a state aligned with an observation) and thus returns a full estimate of the observations being produced from any state sequence producible by the model
-
December 16, 2020 at 15:51 #13701
Then why are we using Viterbi algorithm in training stage when we just go straight to Baum-welch after uniform segmentation?
Is it because HInit is designed to load both uniform segmentation and viterbi automatically when we actually can skip viterbi?
-
December 16, 2020 at 18:02 #13704
We could go from uniform segmentation to Baum-Welch, skipping Viterbi training. In fact, we could even go from the prototype model directly to Baum-Welch.
Baum-Welch is an iterative algorithm that gradually changes the model parameters to maximise the likelihood of the training data. The only proof we have for this algorithm is that each iteration increases (in fact, does not decrease) the likelihood of the training data. There is no way to know whether the final model parameters are globally-optimal.
This type of algorithm is sensitive to the initial model parameters. The final model we get could be different, depending on where we start from. This is also true for Viterbi training.
So, we use uniform segmentation to get a much better model than the prototype (which has zero mean and unit variance for all Gaussians in all states). Starting from this model should give a better final model than starting from the prototype.
The model from uniform segmentation is used as the initial model for Viterbi training, which in turn provides the initial model for Baum-Welch.
Another reason to perform training in these phases is that Viterbi training is faster than Baum-Welch and will get us quite close to the final model. This reduces the number of iterations of Baum-Welch that are needed.
-
-
AuthorPosts
- You must be logged in to reply to this topic.