This video just has a plain transcript, not time-aligned to the videoTHIS IS AN UNCORRECTED AUTOMATIC TRANSCRIPT. IT MAY BE CORRECTED LATER IF TIME PERMITSHow are we going to walk the frequency scale? We're going to put the speech through something who's outputs and non linearly spaced along the frequencies.So if you think about what the cock clear does, this is a little snail little spiral thing inside the ear.It just lays out frequencies along the physical access that's called tone atopic.Place frequency representation in the higher frequencies.There are fewer and fewer hair cells per bit of bandwidth.There's less and less resolution.We're going to simulate that in a very crude way.This is the spectral domain.That's frequency.It hurts.So it's a linear frequency scale.We're going to put some philtres and downcast philtres.So cast your mind back all the way to perhaps the second lab session where you put speaks through some bound pass philtres and just listen to the energy and narrow bands.We're going to put band Pass philtres that gather the energy from little regions of the frequency scale.So the first possibility just rejects everything and then collects it in this narrow range.Down here, it sums up the energy and pushes it out as a single number.This is for one frame of speech, 1 25 minute second frame of speech.And as we got the of the scale, we're going to make those philtres wider and wider.So the next one higher up the scale might be this one here that's going to gather energy across a wide range of frequencies.But the high end of the scale and the spacing here wider and wider because we got the frequency scale.We're going to space that on some perceptual scale, something that's inspired by measurements people have made on human hearing.And so one scale is called the mail scam.There are other scales you could use.The other popular one is called the Barksdale.Very similar.They all have the same property.They just get wider part of frequency scale, nonlinear.So that walked the frequency scale that's that done.It also does something else.It's extremely useful.These philtres We're going to be wider than the spacing between the harmonics.What speech looks like? Got an overall envelope and it's got these harmonics.These philtres are going to be wider Damn F zero several times f zero.So they're going to gather together, for example, this range of frequencies and so in the outputs will have smoothed the way all evidence of zero.So we'll be capturing the spectral envelope.So it's built about those multiple jobs.It walks the frequency scale.And it does every cheap, efficient way of getting the spectral envelope out on removing all the evidence of F zero, which we don't want the speech recognition and that does that simply by being philtres, being wide enough to smear away.Think of it as just blurring away F zero averages across Rangers.That means we don't have the resolution to CFCs.So we plot these philtre bank outputs.What we'll see is a crude version of the spectral envelope, the nonlinear frequency scale, and it won't go up and down fast enough to be able to capture of Sierra removed.Spectral envelope has got special envelope on its warp frequency scale on the scale, we might use me something like this Mel scare.What we've got to so far is that instead of simply using the 50 magnitude spectrum was our feature vector.Next come do that.It's not quite right yet are the outputs of these philtre bank, so we just put those margins just for this is just going to be a four dimensional vector.These four numbers and now what we're going to do pattern recognition on already better than the magnitude spectrum because A they don't have any evidence of zero.So the independent of the underlying pitch of the speech on to there on a nonlinear walk frequency scale.And so they put Mohr coefficients more importance.Lower frequencies, which is where there's more information in the speech signal, and they have a very crude representation of higher frequencies.So things like cricket IBS so we can capture the difference in certain sure, but nothing much more than that to these high frequencies.
Filterbank
The filterbank is the first step in feature engineering: it warps the frequency scale and removes F0.
Log in if you want to mark this as completed
|
|