Generative model view

You know the drill: think of the entire model as generating speech.

slownormalfast

This video just has a plain transcript, not time-aligned to the videoTHIS IS AN UNCORRECTED AUTOMATIC TRANSCRIPT. IT MAY BE CORRECTED LATER IF TIME PERMITS
So here's the best way to think about the whole system, and this is the generative model way of thinking about it.
This is the way I want you to think about it.
It's much easier to understand that this way we've got a hierarchy of generative models, so we probably in general going to recognise whole utterances.
We'll call them utterances rather than sentences.
Sentence imply some sort of grammatical unit.
And we've no idea whether that what people say is grammatical.
We're just saying utterance on acoustic unit a thing with silence beginning and end.
So we have utterances.
We have a generative model ofthe utterances on the generative model generates sequences of words.
Okay, so what is that generally model? That's the language model.
We could take a language model and we could randomly generate sentences from it, like state one randomly generate.
This model can generate sentences.
Okay, take a random walk through the language model star.
Here, toss your coin.
This is going to be 10 sided dice and you're going toe.
Look at the number and take a random transition.
Let's do it.
Here we are and we generate the sentence one.
Do it again.
Maybe we generate the sentence.
Zero randomly generate sentences.
Language model is a generative model ofthe sequences of words.
This is a very simple one in general, every sequences.
So that is the language model, right for each word.
We're now going to generate the sequence of phoney names, abstract names of units that make about word.
There's another generative model here.
Anybody like to propose what the name of that generation model is? My guess is from words to phoney names.
Pronunciation Dictionary.
So that's the dictionary.
I'm just gonna write.
Addict Dictionary is a generative model, but given a word will generate a series of phones.
You look up the word and it will give you the secrets of phone aims.
It might be just a simple deterministic mapping a fancy diction, and I have two pronunciations for the same Ortho graphic word it would randomly generate between the two of you might even put probabilities on those.
Press the random button on the dictionary for a word, and it will pop out of sequence of phonemes.
That's a generative model.
Addiction is also a generative model, right? And then those phoney names map onto the names of hmm.
So this is usually just a simple look up.
A simple mapping.
So there's a series of Asian member states on hmm states.
What did they generate? You know the observations.
If we join all of those things together, we've got an utterance model that eventually generate secrets of observation.
The whole thing was a model, and if we really try really hard thinking this probably stick modelling paradigm.
Think of this great big button.
Do we press our big bottom? The language model randomly generates the secrets of words.
For each of those words, that dictionary randomly generates the secrets of phone names that might, that might be just always the same for each word.
If it's a single pronunciation, the phone names generate their secrets of hmm stakes, and the hmm states randomly generate their sequence of observations so we could do speak synthesis from this model.
If you take the synthesis course, we'll see you exactly.
We were actually do so.
This is like this.
Do something a bit clever to make these observations, something we can turn back into way forms.
But that's how we could do speech synthesis recognition were essentially doing that except we're locking down the observations to be a particular sequence on when we press the button we don't randomly generate, we randomly generate that particular one on.
The byproduct of that is to find the probability of doing that.
And that's the probability of a particular items.
And then we just have a search problem off doing it.
For every possible entrance in the language, you need it iterated over every utterance, which is over every possible worst sequence.
Just run.
This big generative model seems a slightly backwards way of looking at things, but it's just that it's the right way to understand things.
A good way to understand it's a dictionary is give, given a word emits the sequence of earnings.