- This topic has 1 reply, 2 voices, and was last updated 5 years, 2 months ago by .
Viewing 1 reply thread
Viewing 1 reply thread
- You must be logged in to reply to this topic.
› Forums › Automatic speech recognition › HTK › HMM model in HTK
Hi,
I was curious about why we’ve chosen to have three non-emitting states for each digit. It doesn’t seem to correspond to phones, because “two” has two phones, “zero” has four phones, etc.
Have we designed it this way because of the limited amount of training data? or is it something else?
I’m sorry if you’ve mentioned this else where before.
Thanks,
Yichao
The states in a whole word model do not correspond to phonemes. Figure 9.4 in Jurafsky & Martin (2nd edition) implies this is the case but what they are doing is constructing a word model from sub-word (phoneme) models, and their phoneme models have a single state (which is not common – normally we use 3 states). The figure is misleading.
The number of emitting states in a model is a design choice we need to make. As you correctly say, more states means we will need more training data, because the model will have more parameters.
In the digit recogniser assignment, there are a variety of “prototype” models that have varying numbers of states, for you to experiment with. It’s certainly worth doing an experiment to investigate this; make sure it’s one using large training and test sets, not just a single speaker.
You could try using a different number of states for each digit in the vocabulary, but that’s probably not the most fruitful line of experiments.
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2025 · Balance Child Theme on Genesis Framework · WordPress · Log in