From distance measures to probability distributions

We started the speech recognition section with a simple method based on distance measures. Now it's time to shift up a gear and start thinking probabilistically.

slownormalfast

This video just has a plain transcript, not time-aligned to the videoTHIS IS AN UNCORRECTED AUTOMATIC TRANSCRIPT. IT MAY BE CORRECTED LATER IF TIME PERMITS
And the key problem that the Euclidean distance measure fails to account for is the variability in the data.
Let's go back to our examples.
Here's this sort of work sound this approximate at the beginning of the word one, and here's the worst sound here.
We're going to measure the distance between those two things, and then we're going to measure distance between nasal sound on a nasal sound here.
Now, maybe it's the case that nasal like, extremely predictable, that have very tight distribution.
They don't very, very much.
They're very, very similar to each other.
Well, let's imagine that Worst sounds extremely variable.
They're all over the place.
So what looks like a big distance from the work to our work might not really be very big.
It might just be within the range of normal variation of what sounds, Whereas if we have the same sort of distance we know, we know that normally these things are very tightly distributed.
We know these things are very dissimilar, so there's some natural variance within these different classes and is different for each class.
That's what I'm going to try and account for with probabilities.
We're not going to try and guess we're not going to say from our acoustic genetic knowledge that was more variable and knows the less variable, which is going to learn it from data.
We're going to look at data and say How much do things very within a class, we're going to use that to normalise our distance measures.
To get an idea of what counts is a big distance, and what counts is a small distance.
So let's start with this example.
So what do we got here? We're just doing the local distance between one vector on another factor.
What I've got is a collection of vectors of one particular class.
So these green things air of one particular class on the blue things over.
Another particular class I'm going to now have unknown.
So these are collections of reference vectors sees these references of green, and these are references of blue going too.
Come along now and do pass and recognition just for a single frame.
I'm gonna put my unknown vector into this diagram and say, Is it the green thing or a blue thing? So let's just put it somewhere.
Let's put it here.
Okay, let's use the Euclidean distance to decide.
Is it blue or is it green? So the distance to the average blue thing is this much.
The distance to the average green thing is this much.
We're going to announce the answer as being blue.
Okay, because it's nearer the middle of the blue things now, looking at these clouds of green things in clouds of blue things, what do you think the answer is? You think it's green or blue? It's clearly green because the green things are distributed all over here on the blue.
Things only distributed around here, and this is clearly within the scope of the green things.
So how are we going to find the distance measure that would correctly classify this unknown point of belonging to the green distribution and not the blue distribution? And the answer is, I got to take account of this variance.
How far away from the average from the mean Do we normally expect things of this type to go? We're going to normalise the distance by that.
Rather, just the absolute distance to the average is going to be the absolute distance, normalised by how much we would normally expect things to vary from the average going normalised Euclidean distance.
We're going to derive that properly in the news probabilities to represent it.