- This topic has 1 reply, 2 voices, and was last updated 3 years, 8 months ago by .
Viewing 1 reply thread
Viewing 1 reply thread
- You must be logged in to reply to this topic.
› Forums › Automatic speech recognition › Hidden Markov Models (HMMs) › Language Model, Acoustic Model, Pronunciation Model
I am getting really confused over the differences between the language model, the acoustic model and the pronunciation model.
I know the acoustic model is what the HMM calculates- the likelihood of a particular sequence of words for a model. The pronunciation model links things in the acoustic model to things in the language model (I am unsure how) . The language model I am most confused by because when I google it, all links are about n-grams and I don’t understand how n-grams fit into our model as I have never learnt about them before.
The language model emits a sequence of words.
The pronunciation model emits a sequence of phonemes, given a word.
The acoustic model of a phoneme emits a sequence of observations (e.g., MFCC feature vectors).
In order to compile these together into a recognition graph, all of them must be finite state.
The most common form of language model is an N-gram. This is a finite state model that emits a word sequence. The probability of a given word sequence is the product of the probability of each word in the sequence. Those word probabilities are computed given only the N-1 preceding words.
Another form of finite state language model is the hand-crafted grammar used in the digit recogniser exercise.
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2024 · Balance Child Theme on Genesis Framework · WordPress · Log in