This video just has a plain transcript, not time-aligned to the videoTHIS IS AN AUTOMATIC TRANSCRIPT WITH LIGHT CORRECTIONS TO THE TECHNICAL TERMS Let's move on from discrete things, from coloured balls to continuous values because that's what our features for speech recognition arethey are extracted from frames of speech, and so far where we've got with that is filterbank features: a vector of numbers.Each number is the energy in a filter (in a band of frequencies) for that frame of speech.So we need a model of continuous valuesthe model we're going to choose is the Gaussian, so I'm going to assume you know about Gaussian from last week's tutorial.But let's just have a very quick reminder.Let's do this in two dimensions.So I just take two of the filters in the filterbank and draw that two dimensional space.Perhaps I'll pick the third filter and the fourth filter in the filterbank.each of the points I'm going to draw is the pair of filterbank energies: a little feature vector.So each point is a little two dimensional feature vector containing the energy in the third filter and the energy in the fourth filterso: lots of data pointsI would like to describe the distribution of this data with a Gaussian, and it's going to be a multivariate Gaussianthe mean is going to be a vector of two dimensions.and its covariance matrix is going to be a 2x2 matrix.I'm going to have here a full covariance matrix, which means I could draw a Gaussian that is this shape on the data.We've made the assumption here that the data are distributed Normally and so that this parametric probability density function is a good representation of this data.So I can use the Gaussian to describe data.But how would we use the Gaussian as a generative model?Let's do that.But let's just do it in one dimension to make things a bit easier to draw.I've got my three models againBy some means (yet to be determined) I've learned these models - they've come from somewherethese models are now GaussianSo this is really what the models look like.Model A is this GaussianIt has a particular mean and a particular standard deviation.along comes an observation.So these are univariate Gaussian: our feature vectors are one dimensional feature vectorsSo along comes a 1-dimensional feature vector (it's just a number)the question is "Which of these models is most likely to have generated that number?"the is number 2.1remembers that the Gaussian can't computer probability - that would involve integrating the area between two values.So, for 2.1 all we can say is, "What's the probability density at 2.1?"So off we go2.1 this value ... 2.1 this value ... 2.1 this value.Compare those three.Clearly, this one is the highest.And so we'll say this is an A.That's how we'd use these three Gaussians as generative models.We'd ask each of them in turn, "Can you generate the value 2.1?"For a Gaussian, the answer's always "Yes!", because all values have non-zero probability (density)(.So of course, we can generate a 2.1.What's the probability density at 2.1?We just read that of the curve because it's a parametric distributionand compare those three probability densitiesso we do classification with the Gaussian.Let's just draw the three models on top of each other to make it even clearer.What's the probability of 2.1 being an A or a B or a C?Well, just go to 2.1 and you've got this probability density, this probability density, and this probability densityClearly A is the highest and it's an ABy drawing the models on top of each other, we can actually see the implicit decision boundaries between the classes.It's obvious that up to here A is always the highest value.So this whole region here will always be labelled Athis region in the middle here, the B probability density function is higher than the other two so everything in that region will be labelled as a Band for the remainder, going this way, everything will be labelled CSo thes three Gaussian generative models - whilst not knowing anything about each other - when we lay them on top of each other, we can see that they do form decision boundaries.Thiese three Gaussian form a classifier, and it has boundaries here and hereand divides feature space (which is this whole range of the variable) into three parts and labels one with A, one as B, one with Cbut those boundaries and never stored.We never need to know thosethey arise simply by comparing the probabilities (in fact, the probability densities) of any observation valuethat works in two dimensions.pick some pair of filters Let's pick the the 4th one and the 5th oneLet's have two classes now, so let's have class green.It's got the mean here.That's the mean of class green.It's got some standard deviation in the two feature directions, so let's draw a Gaussian there.This has got full covarianceLet's have class PurpleLet's have it's mean and it's standard deviation in all directions.Maybe that looks like this: it's much tighterow there's going to be - as we move around this feature space, trying all these different points - some will be more likely to be green and some will be more likely to be purplethere is an implicit decision boundary between the two classes, maybe the decision boundary is going to look something like this, perhapseverything here has got a higher probability (higher probability density) of coming from the purple distribution and everything here has got a higher probability density of coming from the green distribution.This classification boundary between the two classes is never drawn out explicitly.If it was that will be called a 'discriminative' model.That's not what we're doingThis is a generative model and we have to find that boundary simply by comparing the probability densities of the two models.And if there are more models, we will get more complicated decision boundaries.So that completes the first part of the class.We want to use Gaussians because they have nice mathematical properties.We want to use generative models because they're the simplest sort of model.We know there isn't anything simpler.We're going to use Gaussians as the generative model of the feature vectors - feature vectors coming from frames of analysis from our speech waveformwe've seen that generative models can be used to classifyultimately, the problem of speech recognition is one of classification.It's one of saying: "Which words were said, out of all the possible words? Which ones were most likely?"So we're going to do that through generative modelling.Now our Gaussians are going to be multivariate: they're going to be in some high dimensional space feature space - it's going to have tens of dimensions.At the moment it's the number of filters in the filterbank.And if we were to model covariance, that would have a very large dimension covartiance matrix[The number of entries in the covraiance matrix would] be proportional to the square of the dimension of the feature vectorThat's bad.So we're going to now do something to the features so that we don't need to model covariance.We going to do feature engineering.
Gaussian distributions in generative models
Using Gaussian distributions to describe data and as generative models
Log in if you want to mark this as completed
|
|