HMM model in HTK

This topic has 1 reply, 2 voices, and was last updated 5 years, 6 months ago by Simon.

Viewing 1 reply thread

Author

Posts
- November 20, 2019 at 07:52 #10296
  Yichao L
  Student
  Hi,
  
  I was curious about why we’ve chosen to have three non-emitting states for each digit. It doesn’t seem to correspond to phones, because “two” has two phones, “zero” has four phones, etc.
  
  Have we designed it this way because of the limited amount of training data? or is it something else?
  
  I’m sorry if you’ve mentioned this else where before.
  
  Thanks,
  
  Yichao
- November 20, 2019 at 09:20 #10297
  Simon
  Professor
  The states in a whole word model do not correspond to phonemes. Figure 9.4 in Jurafsky & Martin (2nd edition) implies this is the case but what they are doing is constructing a word model from sub-word (phoneme) models, and their phoneme models have a single state (which is not common – normally we use 3 states). The figure is misleading.
  
  The number of emitting states in a model is a design choice we need to make. As you correctly say, more states means we will need more training data, because the model will have more parameters.
  
  In the digit recogniser assignment, there are a variety of “prototype” models that have varying numbers of states, for you to experiment with. It’s certainly worth doing an experiment to investigate this; make sure it’s one using large training and test sets, not just a single speaker.
  
  You could try using a different number of states for each digit in the vocabulary, but that’s probably not the most fruitful line of experiments.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

HMM model in HTK

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis