Acoustic modelling and decoding

This topic has 3 replies, 2 voices, and was last updated 8 years, 6 months ago by Simon.

Viewing 3 reply threads

Author

Posts
- November 13, 2016 at 15:22 #6020
  Sandy F
  Student
  In terms of the stages of ASR, where does acoustic modelling end and decoding begin? Specifically, do maximum likelihood and the re-estimation of the HMM parameters occur in the acoustic modelling stage? Does classification of the input sequence (according to the ‘best’ HMM) occur in the decoding stage?
- November 13, 2016 at 17:59 #6021
  Simon
  Professor
  I think perhaps you are being tempted to think of ASR as a pipeline of processes (the dreaded “flowchart” view)? That view leads us into thinking that certain things happen in certain “modules” and other things in other “modules”. Let’s try a different view, in which we only make the standard machine learning split into two phases: “training” and “testing (or recognition)”. Your question is about the second one, and assumes we have a fully-trained model.
  
  We have a single, generative model of spoken utterances. The model can randomly generate any and all possible observation sequences. When the model generates a particular observation sequence, we can compute quantities such as the likelihood (just call that “probability” for now), the most likely state sequence, most likely word sequence, and so on.
  
  Given a particular observation sequence to be recognised, we force our model to generate that particular sequence. We record the most likely word sequence and announce that as the result.
  
  So, all we need is
  
  a) the generative model – this will be a combination of the acoustic model and language model
  
  b) one or more algorithms for computing quantities we are interested in – one of these algorithms will be called the decoding algorithm
- November 13, 2016 at 18:48 #6022
  Sandy F
  Student
  Thanks, that makes sense. One last thing: if the model randomly generates sequences, how do you ‘force’ it to generate a particular sequence?
- November 13, 2016 at 21:30 #6023
  Simon
  Professor
  To “force” the model to generate one sequence, we can only evaluate the probability that it generated that sequence. This is exactly what the decoding algorithm must do: it “decodes” how the model generated the given observation sequence. Typically, we will make an approximation, such as only decoding the single most likely state sequence.
Author

Posts

Viewing 3 reply threads

You must be logged in to reply to this topic.

Acoustic modelling and decoding

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis