- This topic has 1 reply, 2 voices, and was last updated 1 month, 1 week ago by .
Viewing 1 reply thread
Viewing 1 reply thread
- You must be logged in to reply to this topic.
› Forums › Automatic speech recognition › Features › MFCCs + Decoding
Am I right in thinking we have 39 MFCCs per 25ms frame (12 mfccs, 13 deltas, 13 delta-deltas ?). Are we expected to explain the reasons why we used MFCCs in our report
I’m also a little confused about what our language model and acoustic model is in the context of this sentence about viterbi decoding :
Combine probabilities from the acoustic and language models to determine the overall most probable word sequence
Yes, there are a total of 39 elements in the feature vectors. 12 of them are the MFCCs. You have 12+13+13=38 though.
The language model computes P(W) where W is the word sequence of one utterance to be recognised. For the digit recogniser, can you locate and inspect the language model that is being used?
The acoustic model computes P(O|W) where O is the observation sequence. How does O relate to the MFCCs?
How are P(O|W) and P(W) combined to calculate P(W|O), and why do we need to do that?
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2025 · Balance Child Theme on Genesis Framework · WordPress · Log in