This video just has a plain transcript, not time-aligned to the videoYou already know the essential features of Fourier analysis, but we've glossed over a little detail called phase.So we need to now clarify that, as well as then using Fourier analysis to transform any time domain signal into its spectrum: more correctly, its magnitude spectrum as we're going to see.From now on will be calling this the Fourier transform.We take a time domain signal.We break it into short analysis frames.On each of those, we perform a series expansion.The series of basis functions are made of sine waves of increasing frequency.That is Fourier analysis and that gets us now into the frequency domain.In the general case, we need to worry not just about the magnitude of each of the basis functions, but also something called phase.Consider this example here.Look at the waveform on the left.Look at the basis functions on the right.See if you can come up with a set of coefficients that would work.Well, fairly obviously you cannot, because all the basis functions are zero at time zero.We're trying to construct a non-zero value at time zero, and there are no weights in the world that will give us that.So there's something missing here.This diagram is currently untrue, but I can make it true very easily by just shifting the phase of the basis function on the right.Now the diagram is true!Phase is simply the point in the cycle where the waveform starts.Another way to think about it is that we can slide the basis functions left and right in time.So when we are performing Fourier analysis, we don't just need to calculate the magnitude of each of the basis functions, but also their phase.But does phase matter?I mean, what does phase mean?Is it useful for anything?Here I've summed together four sine waves with this set of coefficients to make a waveform that speech-like.There's one period of it on the right.I'm going to play you a longer section of that signal.OK, it's obviously not real speech!I mean, I just made it by adding together these sine waves with those coefficients.But it's got some of the essential properties of speech.For example, it's got a perceptible pitch, and it's not a pure tone.I'm going to use the same set of coefficients, but I'm going to change the phases of the basis functions.So, exactly the same basis functions, they just start at different points in their cycle.The resulting signal now looks very different to the original signal.Do you think it's going to sound different?Well, let's find out.No, it sounds exactly the same to me.Our hearing is simply not sensitive to this phase difference.So for the time being, we're just going to say that phase is not particularly interesting.Our analysis of speech will only need to worry about the magnitudes.In other words, these are the important parts.These phases - exactly where these waveforms start in their cycle - is a lot less important.In fact, we're just going to neglect the phase from now on.If we plot just those coefficients, we get the spectrum, and that's what I've done here.On the left is the original signal, and its magnitude spectrum.On the right is the signal with different phases, but the same magnitudes: its magnitude spectrum is identical.We'll very often hear this called the spectrum, but more correctly we should always say the 'magnitude spectrum' to make it clear that we've discarded the phase information.Something else that's very important that we can learn from this picture is that in the time domain two signals might look very different, but in the magnitude spectrum domain, they're the same.Now that's telling us that the time domain might not be the best way to analyse speech signals.The magnitude spectrum is the right place.Because the amount of energy at different frequencies in speech can vary a lot - it's got a very wide range - the vertical axis of a magnitude spectrum is normally written on a log scale and we give it units of decibels.This is a logarithmic scale.But like the waveform, it's uncalibrated because, for example, we don't know how sensitive the microphone was.It doesn't really matter because it's all about the relative amount of energy at each frequency, not the absolute value.Back to the basis sine waves for a moment.They start from the lowest possible frequency, with just one cycle fitting the analysis frame, and they go all the way up to the highest possible frequency, which is the Nyquist frequency.They're spaced equally and the spacing is equal to the lowest frequency.Here it's 100 Hz.It's 100 Hz because the analysis frame is 1/100th of a second.So what happens if we make the analysis frame longer?Imagine we analyse a longer section of speech than 1/100th of a second.Have a think about what happens to the set of basis functions.Pause the video.Well, if we've got a longer analysis window, that means that the lowest frequency sine wave that fits into it with exactly one cycle will be at a lower frequency, so this frequency will be lower.We know that the series are equally spaced at that frequency, so they'll all go lower and they'll be more closely spaced.But we also know that the highest frequency one is always at the Nyquist frequency.So if the lowest frequency basis function is of a lower frequency and they're more closely spaced, then we'll just have more basis functions fitting into the range up to the Nyquist frequency.A longer analysis frame means more basis functions.This will be easier to understand if we see it working in practise.Here I've got a relatively short analysis frame and on the right, I'm showing its magnitude spectrum: that's calculated automatically with the Fourier transform.Let's see what happens as the analysis frame gets larger.Can you see how a bit more detail appeared in the magnitude spectrum?Let's go out some more, and even more detail appeared.In fact there's so much detail, we can't really see it now.So what I'm going to do is I'm actually going to just show you a part of the frequency axis: a zoomed-in part.The spectrum still goes up to the Nyquist frequency, but I'm just going to show you the lower part of that, so we see more detail.So there's the very short analysis frame and its magnitude spectrum.Zoom out a bit and a bit more detail appears in the magnitude spectrum.We make the analysis frame longer still, and we get a lot more detail in the magnitude spectrum.So a longer analysis frame means that we have more components added together (more basis functions), therefore more coefficients.Remember that the coefficients are just spaced evenly along the frequency axis up to the Nyquist frequency, and so we're just going to get them closer together as we make the analysis frame longer, so we see more and more detail in the magnitude spectrum.Analysing more signal gives us more detail on the spectrum.This sounds like a good thing, but of course that spectrum is for the entire analysis frame.It's effectively the average composition of a signal within the frame.So a larger analysis frame means we're able to be less precise about where in time that analysis applies to, so we get lower time resolution.So it's going to be a trade-off.Like in all of engineering, we have to make a choice.We have to choose our analysis frame to suit the purpose of the analysis.It's a design decision.The next steps involve finding, in the frequency domain, some evidence of the periodity in the speech signal: the harmonics.But that will only be half the story, because we haven't yet thought about what the vocal tract does to that sound source.Our first clue to that will come from the spectral envelope.So we're going to look at two different properties in the frequency domain.We see both of them together in the magnitude spectrum, one being the spectral envelope, and the other being the harmonics.
Frequency domain
We complete our understanding of Fourier analysis with a look at the phase of the component sine waves, and the effect of changing the analysis frame duration.
Log in if you want to mark this as completed
|
|