› Forums › Automatic speech recognition › HTK › Acoustic modelling and decoding
- This topic has 3 replies, 2 voices, and was last updated 8 years, 2 months ago by Simon.
-
AuthorPosts
-
-
November 13, 2016 at 15:22 #6020
In terms of the stages of ASR, where does acoustic modelling end and decoding begin? Specifically, do maximum likelihood and the re-estimation of the HMM parameters occur in the acoustic modelling stage? Does classification of the input sequence (according to the ‘best’ HMM) occur in the decoding stage?
-
November 13, 2016 at 17:59 #6021
I think perhaps you are being tempted to think of ASR as a pipeline of processes (the dreaded “flowchart” view)? That view leads us into thinking that certain things happen in certain “modules” and other things in other “modules”. Let’s try a different view, in which we only make the standard machine learning split into two phases: “training” and “testing (or recognition)”. Your question is about the second one, and assumes we have a fully-trained model.
We have a single, generative model of spoken utterances. The model can randomly generate any and all possible observation sequences. When the model generates a particular observation sequence, we can compute quantities such as the likelihood (just call that “probability” for now), the most likely state sequence, most likely word sequence, and so on.
Given a particular observation sequence to be recognised, we force our model to generate that particular sequence. We record the most likely word sequence and announce that as the result.
So, all we need is
a) the generative model – this will be a combination of the acoustic model and language model
b) one or more algorithms for computing quantities we are interested in – one of these algorithms will be called the decoding algorithm
-
November 13, 2016 at 18:48 #6022
Thanks, that makes sense. One last thing: if the model randomly generates sequences, how do you ‘force’ it to generate a particular sequence?
-
November 13, 2016 at 21:30 #6023
To “force” the model to generate one sequence, we can only evaluate the probability that it generated that sequence. This is exactly what the decoding algorithm must do: it “decodes” how the model generated the given observation sequence. Typically, we will make an approximation, such as only decoding the single most likely state sequence.
-
-
AuthorPosts
- You must be logged in to reply to this topic.