The Gaussian as a generative model

We can think of the Gaussian generating, or "emitting", observations.

This video just has a plain transcript, not time-aligned to the videoTHIS IS AN UNCORRECTED AUTOMATIC TRANSCRIPT. IT MAY BE CORRECTED LATER IF TIME PERMITS
So we better have a think about the classes that were trying to recognise in the lab.
The classes are going to be just whole words.
They're 0 to 9, the whole words, and that's fine.
If we gotta close recovery.
We know that when test time people are gonna walk up to the system and say something outside at the cabaret that would work fine for small recoveries closed, fixed.
So if we're building a voice dialling application for our phone, we only allow people to say digits.
It will be a closed set.
Whole words would work fine, so this is perfectly reasonable in some applications, but more from the not We're going to build systems that have a large recovery.
We'd like to fail to recognise words that we never even saw training time, Same problem speak synthesis.
The corpus might not contain words that were trying to classify, and therefore we can't have a whole world models because there'll be nothing to train them on.
So we'll break the words down into smaller units, and we probably use something like phonemes.
In fact, we're going to break things down a bit further info names, and we'll see why, as we go through the course into something sub frenetic, and for now, I just think that's probably because within a single phoney phone spoken example of a phoney name, the spectrum changes sufficiently that we might want to model the beginning, the middle and the end of a phone.
You might want to divide it into some frenetic units, some smaller units.
So let's now think about the galaxy and distribution, which is why we must honour our features into every species in the first place and see that this is indeed a form of generative model.
It's not just a way of describing data.
It could generate data that has all the properties that we need for our generative model.
In other words, it gives a non zero probability to everything.
It never goes down to zero things near the mean get a high probability.
Things far from the mean get a low probability.
So the gassing has all the right properties.
It's mathematically very convenient, so that's our first choice of generative model.
So can we then just use a galaxy and is the generative model? Whether the classes of words or phone aims for something smaller than that.
Okay, so that time warping was working Fine.
But it has limitations, and the limitations are really that the exemplars need to be stored on.
Comparing to single exemplars doesn't really help us generalise across all of those exemplars.
So the history speech recognition went like this.
We have dynamic time warping of single example class people immediate realised that doesn't capture the variation about that example very well.
So then we have dynamic town walk with multiple templates for class.
So we represented the variance by having 10 or 20 or hundreds of examples per class and use that to capture the variation on, then realised that that's not a very efficient or effective way of doing things.
So we capture that variance in a mathematical model, a statistical model, and throw away the exemplars.
So we're going to use Galaxy ins now instead, off the distance, measure the local distance measure.
We need to worry about how we might build than a model of sequences of things from gallstones.
So we don't have any time warping Dunham time walking just two things lines, two examples, and then it computes local distances and adds them up conceptually is completely fine.
The local distance measure we looked out was rather naive.
It was a simple geometric measure.
It doesn't count well for various.
So that's what we're going to fix by using Gazans instead.
So we replaced distance measures with probability distributions.
They're very much like distance measures, but they normalise for the natural variants that started the training.
We only really interesting.
One probability density function.
That's the calcium, because it works well.
Mathematically, everything else is harder to work with.
We're now going to think about the gassing of the generative model, build it up into a generative model not just a one frame but of sequences of Frank's.
But the galaxy is going to be doing most of the work.
Okay, sequence part is just to make things happen in the right order, so back to our rather abstract picture of a generative model.
So we've got this model.
It's just a box.
Black boxes red box, and it's got a button on it, and we can press the button and it generates an observation.
Press the button outcomes and observation.
Remember, observations were always FCC vectors.
So press the bottom outcomes observation fine.
Press the button again, outcomes.
Another observation.
And again, another observation.
So a simple way of generating sequences is just a pre press the bottom lots of times and then lined, the things that generated a sequence of things were generated.
The sequence is going to have very special statistical properties.
I'm actually not going to be ideal for speech, but we'll worry about that later.
How really is this galaxy and generating? Well, there's our calcium, actually.
Let's do this in the car.
So that's the coefficient.
That's maybe one element of this M sec vector.
I can only draw it in one dimension, but it's always going to multi dimensional.
There's going to be some coefficient.
Maybe it's the third capital coefficient, and that's the probability density off the third capsule coefficient.
How does it mean? We learned from the data is just average of the trading samples on DH variance about that system.
Deviation on.
When we press the bottom, we just randomly sample a value along this access, randomly generated number.
The probability of generating particular value is just proportional to the height of this curve.
So we hit this button again and again, and again.
Well, pretty often sample things quite near the mean, because that's very likely.
Just occasionally we might put a sample down here one in 1000 times, one in a million times a generative model.
But it likes to generate things near the beans.
So we got lots and lots of samples near the mean on things far from mean, a lot less often.
And every time we generate from this model, we're just independently sample from this callous and distribution.
And that means this observation.
It's statistically unconnected, un correlated, independent off the next one, the next one and the next one on the next one.
This secret has a very special property, these samples independent, which means every time you press the button that independently generates, it doesn't matter what we did the previous time step.
We don't need to see the past or the future, and they've identically distributed in other ways.
They all come from the same Gaussian distribution.
You want the fancy statistical term.
That means they II d rather naive and simplistic.
That doesn't sound like speech behaved like that, really, because the sound slowly evolves.
That might be a bit of an issue so Garcia could be a generative model.
We press a button on it, and it gives us a random sample.
Control the picture in one dimension but random samples of the vectors of FCC from this multi dimensional calcium and to generate sequences of things we could just repeatedly generate from the model on generator sequence that goes through time, we've got a generative model off sequences of things.
Let's think about whether that's going to be good enough to replace our dynamic time working model.
This bit here, this idea of generation, every time we generate a vector, the model is a byproduct can tell us the probability of having done that.
Let's call this a respecter.
Oh, this sequence owe this one to tell us the probability off one as a by product of generating so we can actually randomly sample from model.
Or we can show it a sequence and say, Just tell me the probability that you would have generated this rather than actually doing the simulation.
So let's just make that really concrete how we were really going to do that.
So we've got this galaxy and it's got a two promises of mean standard deviation promise her name here is just X on when we randomly generated this galaxy and we're quite likely to pick things near me.
Another way of thinking about that is that, given the value given an observation that has a value X, we can calculate the probability, in other words, how how many times, on average, stay out of 100? Would we expect to generate this value? We could just read it off the curve.
So if you tell me a value for X Michael X one, all they need to do is go up the curve.
Read off.
The value is just the height of the curve on DH.
That tells me the probability.
So if I try some values, we ask this galaxy and to generate them, it can tell us immediately what the probability of that happening is.
So let's try generating some values.
So let's generate this value is really close to the mean.
So what? What do we think? The probability of generating this party would be way high or low? All right.
In other words, if we press the button millions of times, we'd expect to see this value quite often we'll use close to it.
Go up there.
Read it often indeed.
Get a high value.
You could have a value here.
He's gonna be high or low.
Hi as well.
Of course, it's also very close to me.
It's just the other side, but symmetrical.
So we go.
If I get the same value, just occasionally maybe get observation here.
It's very far from the mean, very low probability.
So this Garcia can generate this unlikely value here.
It just doesn't like doing it very much.
But it'll give us a low probability of doing so.
These tales never go to zero.
They go on forever.
You never quite get Teo just get really, really small.
And so this probability function can give a non zero value tow any observation, even crazy things right out here that clearly not off this class.
They could give a non zero value, but it will be very, very small.
So as I got, Ian has a generative model and we don't need to do this.
The mathematical simulation.
We can just directly read off the probability for any value.
What is the probability that the gassing generated it? In other words, what is the probability that this observation belongs to this class that were modelling with this calcium.
Is it? Yes.
Is it? No.
Much of this is the galaxy and four.
Yes.
And that is just the probability that this observation belongs to the word Yes, for an individual frame.
And then we could just do that for each of the frames in sequence.

Log in if you want to mark this as completed
This video covers topics: ,
Excellent 51
Very helpful 5
Quite helpful 8
Slightly helpful 3
Confusing 0
No rating 0
My brain hurts 0
Really quite difficult 0
Getting harder 7
Just right 57
Pretty simple 3
No rating 0