› Forums › Automatic speech recognition › HTK › Tokens/Language Models
- This topic has 6 replies, 2 voices, and was last updated 8 years, 8 months ago by Simon.
-
AuthorPosts
-
-
December 1, 2015 at 17:11 #996
Trying to solidify my understanding of the language model’s constraints, in conjunction with token passing. Its difficult to visualize for multi-word sequences. Are the following statements true:
1. For single word recognition, due to the fact that the grammar (and therefore the language model) does not allow for any repetition of words, any token that reaches the end state before the total number N of observations in the observation sequence is reached (N ‘turns of the handle’) will necessarily be consigned to an early death. Thanks to Viterbi, the token that reaches the end state after the Nth turn of the handle will be the ‘winner’, and will represent the most likely pathway through the entire model, and will carry its associated log probability, which can be compared to all the models’ winners, and the model that generated the ‘winner of the winners’ is reported as the ‘recognized word’.
2. For sequences of words recognition, due to the fact that the new grammar we make DOES allow for repetition of words, every time a token reaches the end state, which would happen for every turn of the handle greater than the number of states in our models, even though we are far from N turns of the handle (N is much larger now because the sequence is necessarily longer in duration than a single digit), that token can cross the ‘language model arc’ to the next HMM’s start state (given what is allowed by the language model), and so on…Such that, (assuming 3 emitting states on all models) on the 4th turn of the handle a second HMM, at its first emitting state, is generating observation 4 (with a VERY low probability, but still doing it), and on the 5th turn of the handle that HMM is now generating observations in both its first and second state (while the original HMM is also still emitting observations at all 3 of its states), AND – yet another token has crossed the language model arc to yet another new HMM which now generates observation 5 at its first state..and so on and so on. Until the Nth turn of the handle, at which point however many tokens are in ‘end’ states (anywhere in the chain of models) will all fight for who has the highest log prob, and that token will be the ‘winner’, and its pathway will be reported as a series of the models (words) it passed through – the ‘recognized sequence’.
-
December 1, 2015 at 17:25 #999
1. Yes, that is all correct.
2. Yes, also all correct.
It’s easiest to think about the language model and HMMs having been “compiled together” into a network, and performing token passing around that. That’s exactly how HTK’s HVite is implemented.
In this compiled network, there are emitting states from the HMMs, and arcs (i.e., transitions). Some of the arcs are from the HMMs, and others are from the language model. But after compilation, that doesn’t really matter and tokens can move along any transition to reach another emitting state.
-
December 1, 2015 at 17:47 #1000
Well, despite the fact that I wrote 2. down and you have deemed it correct, I still struggle with a mental model of this ‘compiled network’. Is the following statement also true:
Bearing in mind there are always 11 models running in parallel (10 digits plus junk) every time a new HMM start state is entered, after, say, 103 turns of the handle, (which is only 1.03 seconds in duration), some 1100 HMMs are all churning away, generating observations. After 1003 turns of the handle, there are 11,000 HMMs in action.
That seems like a lot! Is this where pruning comes in? Or have I now gone off track?
-
December 1, 2015 at 18:04 #1002
Don’t think of lots of HMMs running in parallel. Think of the compiled network as one big HMM. That’s because it really is just one big HMM.
It just happens to have a more interesting topology than the individual left-to-right word models, but that makes no difference to token passing. There’s still a start state, some emitting states, some arcs, and an end state. Turn the handle on token passing, and it works exactly as it would for an isolated left-to-right model.
No, there are not 1100 HMMs – the topology of the compiled network is fixed and isn’t changed just because tokens are flowing around it.
Watch the token passing animation – the network there has been compiled from a simple language model which generates sentences from the set “Yes” or “No”, and we have a 2-state whole-word-model HMM for each of those words. As the tokens flow around it, the network itself (i.e., the “one big HMM”) never changes.
-
December 1, 2015 at 18:05 #1003
-
December 1, 2015 at 18:57 #1006
(link fixed)
That’s excellent. I wish you would extend it and show what happens when the token loops back to the beginning! But I think it’s just clicked in for me. Any token that leaves the model, kills off any other tokens that left at the same time with lower log prob, and loops back to the beginning will simply be in competition with whatever token it meets there, and by Viterbi, the winner will win and the loser will die off. This will happen many times (N-3 I suppose for a 3-emitting state HMM). If at any time t that the token that came around from the end and looped back beats the token it meets in state one (which just went around its own self transition loop), it will now be the current winner, at least in that one state. If it were to be the ultimate winner at the Nth turn of the handle, then whatever ‘loops back to the beginning’ it made will be recorded as the word sequence. The tech paper calls these the ‘Word Link Record’. Each time a token leaves a model, ready to loop back to the beginning, it adds that model’s name to its WLR. Any token that ends up the ‘winner’ of the whole sequence will have a WLR that contains every model, in order, that it passed through. Also important to note that each time the token loops back to the beginning, it clones itself so it can go out to all possible models, given the topology. (Which is constrained by the language model).
Hows that??? -
December 1, 2015 at 21:29 #1014
Next time I have a wet Sunday afternoon with nothing better to do, I may indeed do a longer animation for connected word token passing. Don’t hold your breath though.
Yes – you seem to have got it: tokens that have “looped back” from the final state of one word model to the initial state of another (or indeed the same) word model will simply compete with whatever tokens is encounters there. The mechanism is exactly the same as tokens meeting within words.
Tokens will need to remember which words they have been through, so that we can read the most likely word sequence off the winning token. Appending a word label to a token’s internal word link record every time it leaves a word model is the way to do that, yes.
-
-
AuthorPosts
- You must be logged in to reply to this topic.