Impulse response

If we want to characterise a filter in the time domain, we need to know its impulse response.

slownormalfast

This video just has a plain transcript, not time-aligned to the videoThe filter, that's going to model the vocal tract in our source-filter model, operates in the time domain but we'll most commonly think about it in the frequency domain.
That's just much more helpful.
We can see its resonant peaks, for example, whereas the filter coefficients in the time domain are actually rather hard to interpret.
The relationship between those values of coefficients and the format frequencies is a little complicated.
Generally then, we're going to think about our filter in the frequency domain.
But there is a way to characterise it in the time domain, not just through the filter coefficients themselves, but through something called its 'impulse response'.
Here's a synthetic speech-like signal that I've created by putting an impulse train into a filter with two resonant frequencies.
On the left is the time-domain signal I'm analysing
On the right is the magnitude spectrum
The right is always going to show the magnitude spectrum for this analysis frame: the exact signal that we're seeing on the left.
If we analyse that much signal, that's the magnitude spectrum we get.
Let's just fully understand that.
On the left we've got periodicity due to the source, and that has a consequence in the magnitude spectrum of this line structure.
The fundamental period here directly relates to the fundamental frequency here.
All those harmonics are at multiples of that fundamental frequency.
Within one period of this waveform, we see oscillating behaviour.
That's obviously at a much higher frequency than the source.
That has a consequence in the frequency domain of this spectral envelope: this peak structure.
These resonances are what's happening inside here.
Watch what happens when we narrow down the analysis frame (which is always the visible part here on the left) down to just one period of this waveform.
Something really interesting has happened there!
In the waveform on the left, that we're analysing, there's no evidence any more of its periodicity.
We just see this waveform.
We don't know when the next impulse is going to come in and when the next response is going to come out.
So, because there's no evidence of periodicity at F0 on the left, we don't see any harmonics any more on the right.
All that we have evidence for on the left is the oscillating behaviour caused by the resonances of the filter.
So if we take the magnitude spectrum of that signal, we get on the right the magnitude spectrum of the frequency response of the filter.
On the left we have a signal that we call the impulse response of the filter.
It's the response to a single impulse.
On the right, we have the frequency response of the filter.
So there are at least three different ways of describing our filter.
First of all is the equation.
Formally, it's called the 'difference equation' because it's got the input and the output terms in it.
It's very important to understand that this equation is very simple.
In particular, it only has terms x and y at different times.
It never does anything to x and y other than weight them by coefficients and then add them up.
In other words, we never, for example, take the square of y or take the logarithm of x, or any other complicated operation.
So this filter is called 'linear'.
It's just a weighted sum of inputs and outputs.
Linear filters like this are all we need for speech processing.
The filter has an order.
This 1.0 is always fixed.
So this filter has order 4 because it has four numbers that are available for us to vary.
That equation, then, is one way of describing a linear filter.
It's not the most useful way, because those coefficients are not interpretable.
Another way to understand this equation would be to put in an impulse for x and observe the output y.
We would get a plot like this, for this particular filter.
That's the time-domain output of the filter for an impulse input.
It's the impulse response of the filter.
Here, I put in the impulse at time 10 ms.
If we take the magnitude spectrum of this signal by doing Fourier analysis (a Fourier transform of this waveform), we'll get this plot.
That is the frequency response of this filter.
So the equation on the bottom, the waveform on the left, and the magnitude spectrum on the right are all saying the same thing, but in different domains.
The equation is the difference equation.
It's how we would actually implement the filter in software.
The waveform is a description of how the filter behaves in response to an input impulse (the simplest possible input).
The magnitude spectrum is the most useful representation of all, because it shows us this filter is a resonator and that it has two resonances.
We've learned that a linear filter can be characterised by its impulse response.
Knowing the impulse response of the filter and by exciting the filter with a train of impulses, we can generate speech signals with our source-filter model.
The impulse response of the vocal tract filter is given a special name.
We call it a 'pitch period'.
It's one period of output for one impulse input.
I warned you a while ago that the terms 'fundamental frequency' (which we denote by F0) and 'pitch' are used interchangeably in our field, even though they are not the same thing.
We're doing that right here!
The pitch period should really be called 'fundamental period'.
But 'pitch period' is the standard term in the literature and so we'll stay with that.
We're going to see later that, to manipulate a speech signal, we might decompose it into the source-filter model.
We might find out the components of the signal that are caused by the source, those that are caused by the filter, and manipulate source and filter separately, and then put them back together again to make modified speech.
But we'll also find that we can do those modifications directly using the pitch period, because it is the impulse response of the filter.
That means that we can find the filter's response directly in the time domain waveform of speech.