Language Model, Acoustic Model, Pronunciation Model

This topic has 1 reply, 2 voices, and was last updated 4 years, 5 months ago by Simon.

Viewing 1 reply thread

Author

Posts
- December 1, 2020 at 14:53 #13322
  Isobel W
  Student
  I am getting really confused over the differences between the language model, the acoustic model and the pronunciation model.
  I know the acoustic model is what the HMM calculates- the likelihood of a particular sequence of words for a model. The pronunciation model links things in the acoustic model to things in the language model (I am unsure how) . The language model I am most confused by because when I google it, all links are about n-grams and I don’t understand how n-grams fit into our model as I have never learnt about them before.
- December 5, 2020 at 16:14 #13421
  Simon
  Professor
  The language model emits a sequence of words.
  
  The pronunciation model emits a sequence of phonemes, given a word.
  
  The acoustic model of a phoneme emits a sequence of observations (e.g., MFCC feature vectors).
  
  In order to compile these together into a recognition graph, all of them must be finite state.
  
  The most common form of language model is an N-gram. This is a finite state model that emits a word sequence. The probability of a given word sequence is the product of the probability of each word in the sequence. Those word probabilities are computed given only the N-1 preceding words.
  
  Another form of finite state language model is the hand-crafted grammar used in the digit recogniser exercise.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.