Forum Replies Created
-
AuthorPosts
-
Since you’re on Edina noteable, I think you just need to install anytree. You should be able to do this using
pip
instead ofconda
, i.e. in the notebook
!pip install anytree
You can also try installing urllib3 and requests that way – you should get a message along the lines of ‘Requirement already satisfied…’
You probably also want to check that ffmpeg is actually installed. If you already have a conda enviroment, the easiest way to get it is to run this in the terminal
conda install ffmpeg
You’ll need to have your conda environment active, e.g.
conda activate slp
where slp was the name of the environment you use with the jupyter notebooks.
You’ll need to replace ‘LOCATION_OF_FFMPEG’ in the notebook with an actual path location. To find this, in the terminal try
which ffmpeg
You’ll probably get something like /usr/bin/ffmpeg, but it could vary depending on how you installed ffmpeg.
Note: you don’t need to put “!” at the beginning of this if you run commands in the terminal. the “!” is just if you want to run bash commands with the Jupyter Notebook.
There’s a bit about phase in the reading for Module 2:
https://www.ee.columbia.edu/~dpwe/pubs/Ellis10-introspeech.pdfBut it might be more practically helpful to look at some animations. There are some nice interactive ones on this website:
https://jackschaedler.github.io/circles-sines-signals/index.htmlThis one relates sines and cosines to rotations around the unit circle (the ‘phasor’ in the module 1 notebooks
https://jackschaedler.github.io/circles-sines-signals/sincos.htmlThis one shows what a change in the phase angle means:
https://jackschaedler.github.io/circles-sines-signals/trig_review.htmlAnd here is a visualization of phase with respect to the DFT.
https://jackschaedler.github.io/circles-sines-signals/dotproduct4.htmlYou can also play with this by changing the
params
settings in the last exercise of thesp-m1-3-sampling-sinusoids.ipynb
notebook and generate some new animations there.The key point for us is that DFT produces N magnitude and phase outputs.
The phase outputs basically tell you how much you would shift a specific cosine wave (basis function) if you were to try to reconstruct the original input by adding up scaled and shifted versions of those N cosines associated with the N DFT outputs (remembering that only half of those are distinct because of aliasing!).You can play with how changing the phase of an input component effects the DFT magnitude and phase outputs in
sp-m1-5-interpreting-the-dft.ipynb
by changing the values in the code under section 5.3 ‘DFT of a compound waveform’ (see thegen_sinusoid
function). Basically, the phase angle changes the point on the unit circle where you start sampling your cosine waves.In terms of visualizations, I also quite like this ‘Lead/Lag’ video from the Khan Academy Electrical Engineering course, though the teacher never actually says phase! The videos that follow in that series also have some nice visualizations of Euler’s formula.
https://www.khanacademy.org/science/electrical-engineering/ee-circuit-analysis-topic/ee-ac-analysis/v/ee-lead-lag?modal=1October 9, 2020 at 17:10 in reply to: Clarify the Difference between the Filter and Output Response #12307The frequency response of a filter describes how a filter interacts with input signals in the the frequency domain. We can then talk about the frequency magnitude response as a curve of frequencies (ranging from 0 Hz to the Nyquist frequency) versus magnitudes.
So, you can think of the frequency response of a filter telling you how the applying a filter to an input signal would change the frequency spectrum of the input signal (aka the frequency response of the DFT to the input!). For example, you might design a low pass filter that completely attenuates all frequency components that are greater that 8000 Hz. In theory this filter would have a magnitude response curve that goes to zero for all frequency values above 8000 Hz.
To see that the frequency response of a filter is something different (but closely related) to the magnitude spectrum of a specific waveform, you can think about impulse trains with different fundamental frequencies.
We know that impulse trains with different fundamental frequencies have different harmonics, so their magnitude spectra differ in terms of which frequencies would get non-zero magnitudes. For example, if we had an impulse train with F0=100Hz, only frequencies which were multiples of 100Hz would have positive magnitudes, but if our impulse train had F0=200Hz, only multiples of 200Hz would be positive. However, if we applied the same filter to both impulse trains, both of their magnitude spectra would have the same overall shape (i.e. spectral envelope), which would match the shape of the filter’s frequency magnitude response! (You can use the code in the Module 2 FIR and IRR filter notebooks to check this!)
So, to get down to the terminological question ‘the frequency representation of the filter’ is the same as the ‘frequency response’ of the filter. This determines what the filter will do to the frequency components of the input signal. If we do the DFT of the output of applying the filter to an input in the time domain it will have the same overall shape of the frequency response of the filter, but the actual details of the magnitude spectrum will depend on the frequency components of the input signal.
Side note: it’s good to remember that there are actually two parts of the frequency spectrum: the magnitude spectrum and the phase spectrum. So the frequency response of a filter also includes a separate phase response. Most of the time, for speech technology applications, when we talk about the frequency response of a filter, we’re just talking about the frequency magnitude response but it’s worth noting that a filter can have an effect of phase shift too (see the moving average example in the FIR filter notebook!).
October 9, 2020 at 13:36 in reply to: Why doesn’t DFT[0] tell us anything about the frequency of the input? #12301Yes, that’s right!
We can interpret this as telling us the bias of the input amplitude in the time domain. That is, when reconstructing the original waveform should we shift all the cosines representing our frequency components up (positive bias), down (negative bias) or keep them centred at zero (zero bias).
First thing to try would just be to
Restart and Clear Output
(from theKernel
drop-down menu). This resets various environment variables which might solve the problem.If that doesn’t work, you might need to set the location that matplotlib looks for ffmpeg:
import matplotlib as mpl mpl.rcParams['animation.ffmpeg_path'] = LOCATION_OF_FFMPEG print(mpl.rcParams['animation.ffmpeg_path']) ## to see what it's set to
If you’re running a unix terminal, you can check the location of an application in the filesystem using:
which ffmpeg
You can do this within a python note (at least if you’re using Mac or Linux) in a code cell and adding a “bang” at the front:
!which ffmpeg
If this returns nothing, then probably you need to set your PATH (where the computer looks for applications/commands) to include wherever ffmpeg is installed. You can see that with the command:
echo $PATH
Note that activating a conda enviroment will change your PATH settings, so you might need to check that you’ve activated the right environment, especially if you used conda to install ffmpeg. You can easily get into a mismatched state if you have several versions of python or conda environments on your machine!
Sorry for the confusion! Better phrasing would be:
‘all the input values get scaled up as all the coefficients in b are greater than 1. The input 2 steps (i.e., x[n-2]) before the current output time step (i.e., y[n]) will get the biggest increase (i.e., b[2]=1.5)’.
Yes, the peak amplitude of the candidate harmonic (basis function) is 1.
You can see this from the DFT equation: each term in the sum is the nth input value (x[n]) times the corresponding point on the basis function (the phasor in the signals notebooks), the latter is represented by a complex number (e^{-jnk 2pi/N}, for DFT[k]) which has magnitude 1. This means the corresponding sinusoid has a peak amplitude of 1.
If you look at the magnitude spectrum of a single impulse you’ll see that all the DFT output frequencies have non-zero magnitudes. This means if you send in a single impulse you potentially excite every possible frequency.
If we look at the DFT equation, we see that this happens because the impulse input sequence has exactly one non-zero value, e.g. [0,1,0,0,0,0,0]. This means that when we multiply it with a DFT phasor, we basicaly select one complex number sitting on the unit circle (magnitude 1) as the DFT output, no matter which DFT output frequency we’re analysing. That’s how we end up with non-zero magnitude for every DFT output when we apply the DFT to a single impulse.
If the mechanics of the DFT equation are too much right now, don’t worry!
Another way to think of an impulse is just as a burst of energy – like an infinitely short burst of air through your vocal folds (i.e. the source of the source-filter model). On it’s own it doesn’t tell you much. But if you put energy at different frequencies into a filter, you’ll get an idea of what that filter’s properties are by seeing which frequencies are boosted by the filter and which are attenuated. It’s a bit like blowing across the top of a bottle to make a flute-like sound come out.The basic idea is that if you send an impulse into a filter (in mathematical terms, we’d do a convolution) and perform the DFT on the filter’s output, you can then see how that filter shapes the frequency spectrum. For example, does the filter boost low frequencies but dampen high frequencies (i.e. a low pass filter)? Or the opposite (i.e. a high pass filter)?
The physical filter we’re most interested in modelling for Speech Processing is, of course, the human vocal tract, but we could also think about other sorts of tube like objects a trumpet. In this case, if you put in impulses (i.e. air flow with glottal pulses) at regular intervals (i.e. an impulse train) then you’ll produce a wave with a fundamental period (T0) matching the time between impulses, and so you get a fundamental frequency of F0=1/T0. We also know that the DFT of an impulse train has a non-zero magnitude at every integer multiple of F0 (the harmonics, as you mentioned above).
In this way, we can model the pitch of the human voice (more frequent impulse, makes for a higher F0). But alongside this, each impulse also potentially excites the resonant properties of the filter. For our voices, the properties of the filter depend on how we shape/constrict the vocal track using articulators like our tongues. The frequencies that get boosted are the resonances of the vocal tract, but if you’ve already done some phonetics you might also know those resonances as formants: changing the vocal tract filter changes which vowels and consonants we hear!
Side note: If we go back to the discussion of DFT outputs including a magnitude and phase angle, we can also interpret the magnitude spectrum of a single impulse as saying that if we want to create a single impulse from a bunch of cosines, we basically need to add up cosines of all the frequencies we can get our hands on and each needs to be slightly shifted in phase. Similarly, if we want to make an impulse train from cosines, we need to add together versions of all cosines matching the frequencies of all the harmonics. The main takeaway from that is that it’s actually pretty hard work to make an impulse (sharp, spikey) from sinusoids (curvy)!
It’s basically because the labelling of which side is opposite, adjacent or hypotenuse is defined relative to the position of the angle.
The hypotenuse is always the side opposite the right angle, with neither end touching the right angle.
The opposite side is always the side of the triangle that doesn’t touch the angle of interest, theta, at either end (and one end touches the right angle).
The adjacent is always the side of the triangle that is touches the angle of interest, theta, (but is not the hypotenuse), i.e. one end touches the right angle the other end touches the angle of interest.
Once we’ve established which angle theta in the right angled triangle that we’re interested in, the cos(theta) is defined as the length of the adjacent/hypotenuse.
In the Speech Processing course, we mainly use these trigonometric definitions to go between the polar coordinates (magnitude and angle) and rectangular (real, imaginary) representations of complex numbers, but there are some nice practical examples in this video that don’t involve complex numbers at all (just a 2 dimensional space).
Attachments:
You must be logged in to view attached files.You might need to try restarting the kernel for that notebook: go to ‘Kernel’ in the menu at the top of the notebook and click ‘Restart and Clear Output’. Let me know if that works!
We basically need the phase angle from the DFT if we want to reconstruct the original input signal from the DFT outputs. Each DFT output tells us how to scale (magnitude) and shift (phase) the cosine wave with the frequency associated with each DFT output. Once we scale and shift these cosine waves with the DFT magnitudes and phases, we can recreate the original input signal as it was (with some limitations!).
You can use the code in sp-m1-3-sampling-sinusoids.ipynb (‘Generating linear combinations of sinusoids’) to play with this a bit and also to see why you’ll need phase information to recreate the original input. If you change the params variable there to:
params = [(1, 2, 0), (1, 6, 0)]
You’ll generate a waveform made up of a 2 Hz sine wave and a 6 Hz sine wave, both with peak amplitude 1 and no phase shift.
If you compare this to the version where we apply a phase shift of pi/3 radians to the 6 Hz component:
params = [(1, 2, 0), (1, 6, np.pi/3)]
You see a somewhat different compound waveform. So, if we want to recover the latter example from the DFT, we need the phase information for the frequency components we identify as being present in the original signal. Otherwise we won’t get back the input. You can also change the params variable in the notebook to check that a cosine wave is the same as a sine wave shifted by pi/2 radians.
So, we can think of the phase output of the DFT to be independent of the frequency associated with that output.
That said, actually it’s not that clear that you need phase information for doing tasks like automatic speech recognition where we’re mainly interested in the which frequency components are present in a signal, not whether we can reconstruct them. For this reason we often just focus on the magnitude spectrum for actual applications and ignore the phase spectrum!
I should also mention that if you’re using a conda environment, you can download it directly. With your desired conda environment activated, run the following on the command line:
> conda install ffmpeg
Thanks Ross! I’ve fixed it in the github repo.
-
AuthorPosts