when is negative log likelihood used and why

This topic has 1 reply, 2 voices, and was last updated 4 years, 6 months ago by Simon.

Viewing 1 reply thread

Author

Posts
- November 23, 2020 at 19:47 #13234
  Mouhamad A
  Student
  A bit broad sorry, but I am confused between NLL and LL.
  I have used GMMs before to train a speaker recognizer and the error term was the per-sample average negative log-likelihood, this assumes that the probabilities are between 0-1 hens the negative.
  However, the Gaussian PDF will always output positive values (no negative area under the curve), i.e only using the log will be enough.
  So when is NLL used (in speech processing domain) and why?
- November 24, 2020 at 09:31 #13237
  Simon
  Professor
  Distance is what we use in pattern matching. A smaller distance is better: it means that patterns are more similar. The Euclidean distance is an example of a distance metric.
  
  Probability is what we use in probabilistic modelling. We use probability density rather than probability mass when modelling a continuous variable with a probability density function. The Gaussian is an example of a probability density function (pdf).
  
  When we use a generative model, such as a Gaussian, to compute the probability of emitting an observation given the model (= conditional on the model) we are calculating a conditional probability density, which we call the likelihood.
  
  A larger probability, probability density, or likelihood is better: it indicates that a model is more likely to have generated the observation.
  
  To do classification, we only ever compare distances or likelihoods between models. We don’t care about the absolute value, just which is smallest (lowest) or largest (highest), respectively.
  
  The log is a monotonically increasing function, so does not change anything with respect to comparing values. We take the log only for reasons of numerical precision.
  
  It doesn’t matter that probability densities are not necessarily between 0 and 1; they are always positive, so we can always take the log. A higher probability density leads to a higher log likelihood.
  
  Taking the negative simply inverts the direction for comparisons. We might use negative log likelihoods in specific speech technology applications when it feels more natural to have a measure that behaves like a distance (smaller is better).
  
  In general, for Automatic Speech Recognition, we use log likelihood. This is what HTK prints out, for example. Those numbers will almost always be negative in practice, but a positive log likelihood is possible in theory because a likelihood can be greater than one when it is computed using a probability density.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

when is negative log likelihood used and why

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis