› Forums › Automatic speech recognition › Hidden Markov Models (HMMs) › Language model scaling factor
- This topic has 2 replies, 2 voices, and was last updated 9 years, 2 months ago by Merce V.
-
AuthorPosts
-
-
November 24, 2015 at 10:51 #843
In Jurafsky and Martin (chapter 9.6) a language model scaling factor is introduced. I understand that it reweights the language model probability because of the assumption of independence that we use for our HMMs, but I am not sure I understand how this works. They also talk about a “penalty for inserting words” that needs to be taken into account, but I am a bit confused about this too. A more clear explanation would help me.
-
November 24, 2015 at 11:10 #844
HMMs actually compute the likelihood p(O|W) where O is the observation sequence and W is the HMM (or a sequence of HMMs). Note the small “p” – it’s a probability density, because we are using Gaussians, and not a true probability, although it is in some loose sense “proportional” to probability.
So, the likelihood computed by the HMM, and the probability from the language model, P(W), are on different scales. They have a quite different range of values. You can see that for yourself in the output from HTK when it prints acoustic log likelihoods, which tend to be large negative numbers like -1352.3.
Therefore, some scaling of these two values is necessary before combining them. We do that scaling in the log domain: multiply the language model probability by a constant value before adding it to the acoustic model log likelihood. If we didn’t do this, the acoustic model likelihood would dominate and the language model would have little influence.
We call the language model scaling factor (sometimes called the language model weight) a hyperparameter because it is something separate from the actual HMM and language model parameters.
Separately from this, there is another hyperparameter, called the word insertion penalty. This allows us to trade off insertion errors versus deletion errors in order to minimise Word Error Rate.
Note that hyperparameters must never be set using the test data. We should reserve some of the training data for this (and not use that part for training the HMMs or language model). This set is typically called a development set.
Generally, both the language model scaling factor and the word insertion penalty are tuned empirically (by trial and error) on a development set.
-
November 24, 2015 at 20:32 #852
Thanks, it is much more clear now.
-
-
-
AuthorPosts
- You must be logged in to reply to this topic.