Forum Replies Created
-
AuthorPosts
-
October 9, 2020 at 13:36 in reply to: Why doesn’t DFT[0] tell us anything about the frequency of the input? #12301
Yes, that’s right!
We can interpret this as telling us the bias of the input amplitude in the time domain. That is, when reconstructing the original waveform should we shift all the cosines representing our frequency components up (positive bias), down (negative bias) or keep them centred at zero (zero bias).
First thing to try would just be to
Restart and Clear Output
(from theKernel
drop-down menu). This resets various environment variables which might solve the problem.If that doesn’t work, you might need to set the location that matplotlib looks for ffmpeg:
import matplotlib as mpl mpl.rcParams['animation.ffmpeg_path'] = LOCATION_OF_FFMPEG print(mpl.rcParams['animation.ffmpeg_path']) ## to see what it's set to
If you’re running a unix terminal, you can check the location of an application in the filesystem using:
which ffmpeg
You can do this within a python note (at least if you’re using Mac or Linux) in a code cell and adding a “bang” at the front:
!which ffmpeg
If this returns nothing, then probably you need to set your PATH (where the computer looks for applications/commands) to include wherever ffmpeg is installed. You can see that with the command:
echo $PATH
Note that activating a conda enviroment will change your PATH settings, so you might need to check that you’ve activated the right environment, especially if you used conda to install ffmpeg. You can easily get into a mismatched state if you have several versions of python or conda environments on your machine!
Sorry for the confusion! Better phrasing would be:
‘all the input values get scaled up as all the coefficients in b are greater than 1. The input 2 steps (i.e., x[n-2]) before the current output time step (i.e., y[n]) will get the biggest increase (i.e., b[2]=1.5)’.
Yes, the peak amplitude of the candidate harmonic (basis function) is 1.
You can see this from the DFT equation: each term in the sum is the nth input value (x[n]) times the corresponding point on the basis function (the phasor in the signals notebooks), the latter is represented by a complex number (e^{-jnk 2pi/N}, for DFT[k]) which has magnitude 1. This means the corresponding sinusoid has a peak amplitude of 1.
If you look at the magnitude spectrum of a single impulse you’ll see that all the DFT output frequencies have non-zero magnitudes. This means if you send in a single impulse you potentially excite every possible frequency.
If we look at the DFT equation, we see that this happens because the impulse input sequence has exactly one non-zero value, e.g. [0,1,0,0,0,0,0]. This means that when we multiply it with a DFT phasor, we basicaly select one complex number sitting on the unit circle (magnitude 1) as the DFT output, no matter which DFT output frequency we’re analysing. That’s how we end up with non-zero magnitude for every DFT output when we apply the DFT to a single impulse.
If the mechanics of the DFT equation are too much right now, don’t worry!
Another way to think of an impulse is just as a burst of energy – like an infinitely short burst of air through your vocal folds (i.e. the source of the source-filter model). On it’s own it doesn’t tell you much. But if you put energy at different frequencies into a filter, you’ll get an idea of what that filter’s properties are by seeing which frequencies are boosted by the filter and which are attenuated. It’s a bit like blowing across the top of a bottle to make a flute-like sound come out.The basic idea is that if you send an impulse into a filter (in mathematical terms, we’d do a convolution) and perform the DFT on the filter’s output, you can then see how that filter shapes the frequency spectrum. For example, does the filter boost low frequencies but dampen high frequencies (i.e. a low pass filter)? Or the opposite (i.e. a high pass filter)?
The physical filter we’re most interested in modelling for Speech Processing is, of course, the human vocal tract, but we could also think about other sorts of tube like objects a trumpet. In this case, if you put in impulses (i.e. air flow with glottal pulses) at regular intervals (i.e. an impulse train) then you’ll produce a wave with a fundamental period (T0) matching the time between impulses, and so you get a fundamental frequency of F0=1/T0. We also know that the DFT of an impulse train has a non-zero magnitude at every integer multiple of F0 (the harmonics, as you mentioned above).
In this way, we can model the pitch of the human voice (more frequent impulse, makes for a higher F0). But alongside this, each impulse also potentially excites the resonant properties of the filter. For our voices, the properties of the filter depend on how we shape/constrict the vocal track using articulators like our tongues. The frequencies that get boosted are the resonances of the vocal tract, but if you’ve already done some phonetics you might also know those resonances as formants: changing the vocal tract filter changes which vowels and consonants we hear!
Side note: If we go back to the discussion of DFT outputs including a magnitude and phase angle, we can also interpret the magnitude spectrum of a single impulse as saying that if we want to create a single impulse from a bunch of cosines, we basically need to add up cosines of all the frequencies we can get our hands on and each needs to be slightly shifted in phase. Similarly, if we want to make an impulse train from cosines, we need to add together versions of all cosines matching the frequencies of all the harmonics. The main takeaway from that is that it’s actually pretty hard work to make an impulse (sharp, spikey) from sinusoids (curvy)!
It’s basically because the labelling of which side is opposite, adjacent or hypotenuse is defined relative to the position of the angle.
The hypotenuse is always the side opposite the right angle, with neither end touching the right angle.
The opposite side is always the side of the triangle that doesn’t touch the angle of interest, theta, at either end (and one end touches the right angle).
The adjacent is always the side of the triangle that is touches the angle of interest, theta, (but is not the hypotenuse), i.e. one end touches the right angle the other end touches the angle of interest.
Once we’ve established which angle theta in the right angled triangle that we’re interested in, the cos(theta) is defined as the length of the adjacent/hypotenuse.
In the Speech Processing course, we mainly use these trigonometric definitions to go between the polar coordinates (magnitude and angle) and rectangular (real, imaginary) representations of complex numbers, but there are some nice practical examples in this video that don’t involve complex numbers at all (just a 2 dimensional space).
Attachments:
You must be logged in to view attached files.You might need to try restarting the kernel for that notebook: go to ‘Kernel’ in the menu at the top of the notebook and click ‘Restart and Clear Output’. Let me know if that works!
We basically need the phase angle from the DFT if we want to reconstruct the original input signal from the DFT outputs. Each DFT output tells us how to scale (magnitude) and shift (phase) the cosine wave with the frequency associated with each DFT output. Once we scale and shift these cosine waves with the DFT magnitudes and phases, we can recreate the original input signal as it was (with some limitations!).
You can use the code in sp-m1-3-sampling-sinusoids.ipynb (‘Generating linear combinations of sinusoids’) to play with this a bit and also to see why you’ll need phase information to recreate the original input. If you change the params variable there to:
params = [(1, 2, 0), (1, 6, 0)]
You’ll generate a waveform made up of a 2 Hz sine wave and a 6 Hz sine wave, both with peak amplitude 1 and no phase shift.
If you compare this to the version where we apply a phase shift of pi/3 radians to the 6 Hz component:
params = [(1, 2, 0), (1, 6, np.pi/3)]
You see a somewhat different compound waveform. So, if we want to recover the latter example from the DFT, we need the phase information for the frequency components we identify as being present in the original signal. Otherwise we won’t get back the input. You can also change the params variable in the notebook to check that a cosine wave is the same as a sine wave shifted by pi/2 radians.
So, we can think of the phase output of the DFT to be independent of the frequency associated with that output.
That said, actually it’s not that clear that you need phase information for doing tasks like automatic speech recognition where we’re mainly interested in the which frequency components are present in a signal, not whether we can reconstruct them. For this reason we often just focus on the magnitude spectrum for actual applications and ignore the phase spectrum!
I should also mention that if you’re using a conda environment, you can download it directly. With your desired conda environment activated, run the following on the command line:
> conda install ffmpeg
Thanks Ross! I’ve fixed it in the github repo.
This mirroring is a fundamental property of the DFT. We go through why it happens in the SIGNALS materials. i.e.:
Try the exercise under ‘The DFT for k = 2 and beyond’ and see if you can see why this happens.
Hi Kerim,
Yes, you need to install matplotlib separately to jupyter notebooks unless you are using the Edina Noteable server (where it is already installed by default).
The instructions here describe the install process if you want to run it locally (see ‘The Normal Way: Running Jupyter Notebooks on your computer’):
https://github.com/laic/uoe_speech_processing_course/blob/master/sp-m0-how-to-start.ipynb
But the basic options to install matplotlib are:
1. If you have Anaconda (or miniconda) installed, you can install matplotlib using the following command (after you have activated your conda environment)
> conda install -c conda-forge matplotlib
2. Otherwise you can use pip:
> pip install matplotlib
If you’ve done that and it still doesn’t work, you’ll need to check your installation of python and related path variables (i.e. where python looks for packages). Let us know, and we can go through that.
If you’re internet access is ok, you can also try using the Edina Noteable service (it really is easier!):
See the instructions in sp-m0-how-to-start.ipynb
Sorry I forgot to include the link the audio for that exercise! It should be the difference between violin_A3_15 and violin_A4_05.
I’ve updated the notebook in the github repository:
https://github.com/laic/uoe_speech_processing_course/blob/master/signals/sp-m1-1-sounds-signals.ipynbI’ve also included a link to a semitone calculator there: http://www.homepages.ucl.ac.uk/~sslyjjt/speech/semitone.html
But the main thing is just to compare those two files.
Thanks for pointing this out Ross! I’ve updated the notebook on github to link to the git download page.
You might need to check whether Anaconda has been added to your path environment variable. There are some instructions here:
https://www.datacamp.com/community/tutorials/installing-anaconda-windows
Let me know if that helps!
-
AuthorPosts