A first look at continuous speech

Once we understand token passing within a single HMM, the extension to continuous speech is surprisingly easy.

slownormalfast

This video just has a plain transcript, not time-aligned to the videoTHIS IS AN UNCORRECTED AUTOMATIC TRANSCRIPT. IT MAY BE CORRECTED LATER IF TIME PERMITS
we want to talk about now in the last few minutes just to get your mind start.
It is.
How do we take that idea Off isolated words.
There's dynamic programming or isolated single H M M's generating single observation sequences and then string those together to make a recognise that they can recognise connected things.
Connected words more connected, phoney names.
So we'd like a large recovery.
Perhaps So maybe if you wanted to build something not for the 10 digits for 10,000 words or 100,000 words.
How would we do that? Probably not going to sit down and record seven examples of each of those 10,000 words.
It's probably not going to be the way to do it.
I would also like to have connected or continuous speech strings of things.
One word, followed by another word, and that's something I would encourage you to try to get onto.
The lab's taking post graduates do strings of digits.
Okay, we'll talk about that in the lab strings of digits for a little bit of silence in between to make life simple.
So how do we do that? It's going to turn out to be really, really easy because we've set our model up in a way that allows them to be easy.
So let's just start with another way of writing down what we doing so far.
So far, we're doing isolated word recognition.
Each of the words we have a hidden Markov model with this whose we put these dummy states here and that maybe is the model of the word zero on DH.
We could separately compute the probability off.
Oh, given zero on the probability of Oh, given all the other words and then compare those, we could be a little bit clever and just put that actually into a single computation.
And we do like this.
We'll take the models.
This thing here is just a model.
It's a model of zero.
With how many states we fancy with these little dummy states, we're just going to string them together into kind of a super Hmm.
Big hidden Markov model is going to join them all back to the states.
The beginning during the most state of the end, we now have a digit recognise that that's a more complicated looking model.
It's just a hidden Markov model.
Conception is no different.
It just happens to have sort of these parallel branches, right.
He's got branches like this, all these branches, and then they happen to come together at the end.
There's nothing in there that's concept to any different, just a simple linear model.
Specifically, token Passing will just work here, too.
For example, who wanted youto come passing on this model will put a toke it.
Let's clear this.
We'll put a token here.
We'll turn the handle.
It will send copies into each of the models.
The copies will do the thing they would have done in the model to go around to do their thing.
Tokens in here, tokens in here turn the handle lots of tokens or jumping forwards in the algorithm.
And just as we generate the last observation and turn the handle that last time a token will pop out of each of them, this token will have written on it.
Probability of Oh, given the model zero, this one will have the probability of given given word one.
They all arrive in the state.
10 tokens will suddenly arrive on.
One of them will have the highest probability we'll destroy the other nine and announced that opens the winner.
Well, that's that token, which were Did you come through? So the token now needs to remember its path.
And then we've done recognition.
And that's exactly how In code H fight programme, you're using works so it doesn't separate.
Compute each of those 10 models joins them together in a network, and you could see that network.
It's there in the files.
It's the language model, and it's in here.
It's just very simple.
Grandma, he just says, This or this or this or this.
There's no actual probabilities on these arcs.
In other words, implicitly.
They're all equal probability.
The uniformed probability.
Just one more minute, with just one slightly over.
Let's just look at how that generalises, then to something more than just isolated words.
I'm just going to really, really easy.
Okay? When we taken hmm of a unit, whether it's a word or a sub word unit like a phoney name and we join it to another hmm, what's the result? It's just a bigger hmm.
There's nothing different about it.
It's all this goddess transitions in states.
It just looks bigger and badder than a bit more complicated, but the algorithm is going to care about that.
Still going to work.
So if you want to make a model off a word, let's make a model of a word now from sub word units, a model of the word cat and imagine.
I never recorded the word cat, but I recorded some other things with cousin and some things with 1000 something with cousin trained models, cousin vases and test, nor the other four names.
And I want to make a model of cat.
I just take the model of cut model on the model of Tough on.
Just use these special little dummy state's non emitting states.
Join them together, put some transitions there.
This thing here is just a model.
A model of the word cat happens to have nine states.
This topology, it doesn't matter.
It happens to be left to right.
We've made a model for something that we never saw in the training data constraint, arbitrary hmm.
Together, make models off the sequence of things and that just generalises them to make sequences of words.
Okay, so we can have models of utterances that are just sequences of word models, models of words.
That sequence of phone models on each phone model is just a sequence of ancient.
There's a beautiful hierarchy to all of this modelling, and that's gonna, point is, then we'll stop on the next slide, point us towards a form of language model.
There's Compatible, in other words, has the same properties as a hidden Markov model.
There was a language more than is just a finite state network allows us string things together so that when we plug acoustic models into the language model, we get something that is still a valley Hidden Markov model not tells us something about the sort of language what will be going to be allowed to use? Well, just quickly.
Look at what one of those might look like.
The language more was going to have to have this sort of property.
It's going to be something that we have to be able to write as a finite state network.
It could be written by hand, or it could be automatically learn from data.
It doesn't matter as long as it's off the same form as the hidden Markov model.
In other words, it's got states and transitions join things together and they were just going to substitute in, for there were just substitute in the hidden Markov model.
For that could be a whole word model.
It could be a model made from phoney models.
It doesn't matter.
Substitute all that in Michael compiling and then just put a token here.
Just turn the handle, see what happens.
Tokens flow through the model at some point token pops out of the end with the answer on it.
This language model could be very simple or very complex.
It doesn't matter.
The same algorithm applies.
So the language will you get given for the assessment? Is this one on? What you might want to do is think about how you would extend that to do sequences of digits.
What would you need to do to this language model to allow one digit to follow another digit