Which Gaussian to Which HMM State?

This topic has 3 replies, 2 voices, and was last updated 9 years, 6 months ago by Joseph M.

Viewing 3 reply threads

Author

Posts
- November 17, 2015 at 01:51 #625
  Joseph M
  Student
  Using a 5 state HMM as an example, which has 3 emitting states, how does HTK (or any similarly built speech recognizer) know which gaussian generative model goes in which state? J&M allude to a ‘beginning, middle , and end’ for a 5 state HMM, which intuitively makes sense. But doesn’t some choice have to be made somewhere about how to decide which multivariate (39 dimensional) gaussian will go in each of the 3 states?
  
  Your video on Token Passing does a good job of explaining how during recognition, the whole observation sequence (of the given ‘test’ audio), can be generated by each specific HMM (which has been trained on a set of specific audio segments (in our case, words)), which ultimately spits out a final probability, which can then be compared with the probabilities spit out by all the other HMMs for all the words in the training data/dictionary, and the model with the highest probability of the whole observation sequence can be declared the ‘winner’, and thus the ‘test’ audio can be classified. Viterbi/dynamic programming is the key, and some pruning can speed up the process. But in your example, you show that each state (in each HMM) contains a specific gaussian that generates the successive observations in the sequence.
  
  Going back to the training of the HMM’s, I semi-understand how Baum Welch can recursively hone in on the best transition probabilities. But its still not clear where in the whole process a given gaussian was ‘placed’ inside each of the emitting states in the HMM. Does Baum Welch figure this out somehow? Or is it something simpler, based on the number of emitting states in the HMM being used, and the length of a given word in a given class of training data?
  
  Can you elucidate? And if I’ve mis-stated something in the above paragraphs, please correct that as well.
- November 17, 2015 at 06:40 #626
  Joseph M
  Student
  Upon further reading of the HTK manual, I would simplify and rephrase the previous post to be: Can you please explain the model training process of going from Uniform Segmentation to Viterbi Segmentation (then iterating) during HInit, and then the further refinement using Forward/Backward (and iterating) during HRest. It’s clear from looking at my own data that both HInit and HRest calculate and revise both the transition probabilities for the states, AND the gaussians at each state. And there appears to be a relationship between the transition probs and the gaussian probs, but that exact relationship is unclear to me. Could you show a worked example, from start to finish, of a single piece of training data as it moves through HInit and HRest, and how the HMM model evolves? It could be a hypothetical word of length 100ms, with 10 frames.
- November 17, 2015 at 09:59 #629
  Simon
  Professor
  This is coming up in lectures – but we need to understand recognition first.
- November 17, 2015 at 15:16 #642
  Joseph M
  Student
  Thank you. By looking ahead at the slides of the upcoming lectures, and pairing that with last year’s video lectures, I was able to expand my understanding of the process.
Author

Posts

Viewing 3 reply threads

You must be logged in to reply to this topic.

Which Gaussian to Which HMM State?

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis