This video just has a plain transcript, not time-aligned to the videoBecause speech changes over time, we've already realised that we need to analyse it in the short-term.We need to break it into frames and perform analysis frame by frame.One of the most important analyses is to get into the frequency domain.We're going to use Fourier analysis to do that, but we're going to introduce that in two stages.So the first topic is actually going to seem a little abstract.There's a reason for introducing series expansion as an abstract concept, and that's because it has several different applications.The most important of those is Fourier analysis, but there will be others.Here's a speech-like signal that we'd like to analyse.So maybe we should really say what 'analyse' means.Well, how about a concrete example?I'd like to know what this complicated waveform is 'made from'.One way to express that is to say that it's a sum of very simple waves.So we're going to expand it into a summation of the simplest possible waves.That's what series expansion means.For the purpose of the illustrations here, I'm just going to take a single pitch period of this waveform.That's just going to make the pictures look a little simpler.But actually everything we're going to say applies to any waveform.It's not restricted to a single pitch period.I'm going to express this waveform as a sum of simple functions.Those are called basis functions and as the basis function I'm going to use the simplest possible periodic signal there is: the sine wave.So let's try making this complex wave by adding together sine waves.This doesn't look much like a sine wave, so maybe you think that's impossible.Not at all!We could make any complex wave by adding together sine waves.I've written some equations here.I've written that the complex wave is approximately equal to something that I'm going to define.Let's try using just one sine wave to approximate this.Try the sine wave here, and the result is not that close to the original.So let's try adding a second basis function at a higher frequency.Here it is: now if we add those two things together, we got a little closerI'll add a third basis function at a higher frequency still, and we get a little closer still, and the fourth one, and we get really quite close to the original.It's not precisely the same, but it's very close.I've only used four sine waves there.The first sine wave has the longest possible fundamental period: it makes one cycle in the analysis window.The second one makes two cycles.The third one makes three cycles, then four cycles and so on.So they form a series.Now, I can keep adding terms to my summation to get as close as I want to the original signal.So let's keep going.I'm not going to show every term because there's a lot of them.But we keep adding terms and eventually, by adding enough terms going up to a high enough frequency, we will reconstruct exactly our original signal.Now we're not just approximating the signal, it is actually now equal.Theoretically, if this was all happening with analogue signals, I might need to add together an infinite number of terms to get exactly the original signal.But these are digital signals.That means that this analysis frame has a finite number of samples in it.This waveform is sampled at 16 kHz and it lasts 0.01 s.That means there are 160 samples in the analysis frame.Because there's a finite amount of information, I only need a finite number of basis functions to exactly reconstruct it.Another way of saying that is that these basis functions are also digital signals, and the highest possible frequency one is the one at the Nyquist frequencySo if I sum up basis functions all the way up to that highest possible frequency one, I will exactly reconstruct my original signal.So what exactly have we achieved by doing this?We've expressed the complex wave on the left as a sum of basis functions, each of which is a very simple function: it's a sine wave, at increasing frequency.We've had to add together a very specific amount of each of those basis functions to make that reconstruction.We need 0.1 of this one and 0.15 of this one and 0.25 of this one and 0.2 of this one and just a little bit of this one, and whatever the terms in between might be, to exactly reconstruct our signal.This set of coefficients exactly characterises the original signal, for a fixed set of basis functions.Because we can choose how many terms we have in the series - we can go as far as we like down the series but then stop any where we like - we actually get to choose how closely we represent the original signal.Perhaps all of this fine detail on the waveform is not interesting: it's not useful information.Maybe it's just noise, and we'd like to have a representation of this signal that removes that noise.Well, series expansion gives us a principled way to do that.We can just stop adding terms.This signal might be a noisy signal and this signal is a denoised version of that signal.It removes the irrelevant information (if that's what we think it is).The main point of understanding series expansion is as the basis of Fourier analysis, which transforms a time domain signal into its frequency domain counterpart.But we will find other uses for series expansion, such as the one we just saw, of truncation to remove unnecessary detail from a signal.What we learned here is not restricted to only analysing waveforms.There's nothing in what we did that relies on the horizontal axis being labelled with time: it could be labelled with anything else.Fourier analysis will then do what we've been trying to do for some time: to get us from the time domain into the frequency domain where we can do some much more powerful analysis and modelling of speech signals.
Series expansion
Speech is hard to analyse directly in the time domain. So we need to convert it to the frequency domain using Fourier analysis, which is a special case of series expansion.
Log in if you want to mark this as completed
|
|