MFCCs + Decoding

This topic has 1 reply, 2 voices, and was last updated 6 months ago by Simon.

Viewing 1 reply thread

Author

Posts
- November 27, 2024 at 11:16 #18173
  Isobel W
  Student
  Am I right in thinking we have 39 MFCCs per 25ms frame (12 mfccs, 13 deltas, 13 delta-deltas ?). Are we expected to explain the reasons why we used MFCCs in our report
  
  I’m also a little confused about what our language model and acoustic model is in the context of this sentence about viterbi decoding :
  
  Combine probabilities from the acoustic and language models to determine the overall most probable word sequence
- November 27, 2024 at 12:51 #18176
  Simon
  Professor
  Yes, there are a total of 39 elements in the feature vectors. 12 of them are the MFCCs. You have 12+13+13=38 though.
  
  The language model computes P(W) where W is the word sequence of one utterance to be recognised. For the digit recogniser, can you locate and inspect the language model that is being used?
  
  The acoustic model computes P(O|W) where O is the observation sequence. How does O relate to the MFCCs?
  
  How are P(O|W) and P(W) combined to calculate P(W|O), and why do we need to do that?
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.