Gaussians and HMM states

This topic has 1 reply, 2 voices, and was last updated 9 years, 3 months ago by Simon.

Viewing 1 reply thread

Author

Posts
- November 26, 2015 at 11:01 #870
  Lee T
  Student
  I’m still a bit confused about the language model HMM. I understand that the token that “wins” is the one that has the highest probability of generating the observation sequence given the word model. However, I don’t have an intuitive idea about what a single state represents. Why is there more than one Gaussian for a single word? I would think that we can just make a single Gaussian for slightly varied pronunciations of the word “one”, for example. What is the effect of changing the number of emitting states? We might go over this when we talk more about training the model, but I’m confused as of now.
- November 26, 2015 at 12:00 #871
  Simon
  Professor
  The language model on its own is not an HMM – it’s just a finite state machine.
  
  Your question is mainly about what the word models look like. Yes, we could indeed use a single emitting state per word. Then, we would be modelling just the average spectral envelope across the entire word duration. That may well be enough to distinguish words in a very small vocabulary system (e.g., isolated digits), but is a rather naive model.
  
  Using more states per word gives us a finer temporal granularity. For example, 3 emitting states per word allows the modelling of (roughly speaking) the beginning, middle and end sounds of that word. Such a model is probably a better generative model of that word, and conversely a worse generative model of other words, so should lead to more accurate recognition.
  
  In larger systems, we use one model per sub-word unit (e.g., phone-sized units) and then of course we will have multiple states per word.
  
  Try it for yourself by experiment – it’s easy to vary the number of emitting states in a whole-word system. You’ll probably want to do such an experiment on a reasonably large multi-speaker dataset in order to get reliable results.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

Gaussians and HMM states

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis