Page 4

Forum Replies Created

Viewing 15 posts - 46 through 60 (of 74 total)

← 1 2 3 4 5 →

Author

Posts
November 17, 2021 at 11:44 in reply to: HTK on Ubuntu WSL doesn’t seem to be working #15283
Catherine Lai
Professor
You may need to run this first for 32 bit compatibility with Ubuntu WSL:
```
sudo dpkg --add-architecture i386
sudo apt update
```
from https://superuser.com/questions/1407730/run-32-bit-application-on-ubuntu-on-windows-subsystem-for-linux

But I haven’t been able to check directly.
November 17, 2021 at 09:26 in reply to: Issue following Git instructions to install HTK on macbook #15281
Catherine Lai
Professor
You might be able to fix this by adding this to the configure command:
```
CFLAGS="-I/opt/X11/include" ./configure --x-includes=/opt/X11/include/
```
This tells the compiler to look in /opt/X11/include for the missing files (which is what the flag is supposed to do)

For other people: We ran into another error after this – I’ll look into it

cheers,
Catherine
November 15, 2021 at 16:49 in reply to: Examination date and format #15277
Catherine Lai
Professor
Hi Dayyan and everyone,

I’ve emailed this to the class but just to post it here too:

The assessment will be:
- A 2 hour “take home” timed test to be done during the specified period (held on GradeScope)
- 50% of the final grade
- date: open from 12:00 Thursday 16th December 2021 to 23:59 Friday 17th December 2021.
This range is to accommodate exams and other commitments people had during that week. Hopefully everyone can find a convenient 3 hour slot in that day and a half in December!

cheers,
Catherine
October 26, 2021 at 11:00 in reply to: can’t be access to this week’s reading #15073
Catherine Lai
Professor
Hi Lijia,

There are links to scanned copies if you click on links to the readings.

Jurafsky & Martin (2nd ed) – Section 8.1 – Text Normalisation

Link to scanned chapter 8

The library has provided these due to the ‘out of credit’ issue. If they are too hard to read due to the scanning, send me an email and we’ll sort something else out.

cheers,
Catherine
October 18, 2021 at 19:24 in reply to: Perturbation Theory #14991
Catherine Lai
Professor
Hi Rory,

The constriction changes what frequencies are resonant (to be higher in the case you mentioned). This means that oscillations of the air particles at that frequency get a boost in energy, which we observe as amplification. As you would see in the spectrogram, a range of frequencies around the resonance are also amplified (The black formant band actually indicates a range of frequencies are boosted).

Oscillations at frequencies that are not so close the the resonant frequency will lose energy, so their amplitudes will go to zero (unless they are boosted by something else!).
In general how sharply frequencies around the natural resonant frequency drop off depends on the “tuning” of the system (a tuning fork is much more sharply tuned than the human vocal tract!).

cheers,
Catherine
October 18, 2021 at 17:32 in reply to: Lowest Frequency Questions #14990
Catherine Lai
Professor
Hi Ian,

You’re right in saying that when we take a window, we ignore everything else that is happening outside the window. So if there are frequencies present in the signal that have a period longer than the window length, we won’t be able to capture them.

This isn’t quite shown in the example you give (though it may be just a typo!). If the window length=0.2 s, then 1/window length = 5 , so the minimum frequency you could capture with that window length is 5 Hz and multiples of that up to half the sampling rate. So in this case you would be able to capture a 10 Hz signal.

If instead the window length = 0.02 s, the frequencies would be 1/0.02 = 50 Hz, and multiples of that (100 Hz, 150 Hz, etc). So in that case, you wouldn’t accurately capture the 10 Hz wave. In this case, since the input frequency 10 Hz falls between the DFT analysis frequencies, you would get leakage: you would see the largest non-zero magnitude at 50 Hz, but also other frequencies in the DFT output will have non-zero magnitudes.

For your last question, the basis functions are exactly the sinusoids (cosine functions to be specific) with frequencies matching the DFT output frequencies. So, if you have an input size of N=10 samples, you will have N=10 basis functions. All (infinitely many) other potential sinsusoids are ignored as they are outside the basis set. So, you can kind of think of them as having zero coefficients, but it’s better to think of the ones that don’t match the DFT analysis frequencies as not being a part of that specific basis function set (determined by the input size and the sampling rate). In that case you only have to deal with N functions rather than an infinite number!

cheers,
Catherine
October 15, 2021 at 12:01 in reply to: Module 3 questions (Digital speech signals) #14923
Catherine Lai
Professor
This question gets at one of the most important parts of understanding the DFT: The number of input samples determines which frequencies you can accurately detect with the DFT.

For a recording, we can assume that we have a fixed sampling rate, f_s, so the time between each sample, the sampling period, is 1/f_s. For example, if the sampling rate is f_s=1000Hz, the sampling period will be 1/1000 = 0.001 seconds.

So, 10 samples will capture a window size of 10*0.001=0.01 seconds.
Taking these 10 samples as input, we can all our input size N=10.

The range of frequencies we can analyse with the DFT is determined by the sampling rate, but the number of frequencies we can actually analyse is determined by the size of the input. The frequencies we can analyse are exactly N points spread evenly from 0 to the sampling rate.

The lowest analysis frequency (associated with DFT[1]) will be the sampling rate/input size, f_s/N, and the other analysis frequencies will be multiples of that up to the sampling rate. So in the example above, the lowest analysis frequency will be f_s/N = 1000/10 = 100. So the DFT output will represent the 10 frequencies 100 Hz, 200 Hz, 300 Hz,…,1000 Hz.

This is essentially what the textbook is getting at: the ‘first harmonic’ mentioned in the text is the lowest analysis frequency. If we work from the DFT equations, we can (eventually) see that the lowest analysis frequency is the same as the frequency of a sine wave that has the same period as the window size. So if the window size is 0.01 seconds, the lowest analysis frequency will be 1/period = 1/0.01 = 100 Hz.

Now, for sampling rate 1000 Hz and input size N=10, we won’t be able to accurately tell if the input has a frequency component of 30 Hz, because this falls between the analysis frequencies (which are all multiples of 100 Hz).

If instead we analyze a longer window, e.g. 0.1 seconds, we would need N=100 samples (100*0.001=0.1 seconds). In this case the lowest analysis frequency will be f_s/N = 1000/100 = 10 Hz (equivalently, period=window size=0.1, frequency=1/period = 10 Hz). So the DFT output will represent N=100 frequencies: 10, 20, 30,…,1000 Hz. This time, with input size N=100 we will be able to capture an input frequency of 30 Hz, because it matches one of our DFT analysis frequencies in this case.

In general, given a fixed sampling rate, the longer the time window we want to analyze, the bigger the input size N will be, and the more frequencies we will be able to accurately detect in the input. This means higher frequency resolution. But if we take a longer window as input, we have less time resolution. For example, if we were to take the DFT over a whole diphthong like [ai], we won’t be able to “see” the changes in formants from the beginning of the vowel to the end as you just get one spectrum out for the whole window. To increase the time resolution, we need shorter windows but this means there are less frequencies we can detect with the DFT as the input size is decreased.

The length of window you want to use depends on what type of analysis you want to do. If we want to look at how the overall spectral envelope changes with detailed time resolution (e.g. to track changes in formants), and don’t care too much about the fine spectral details (like the harmonics due to the voice source) a shorter window would be preferred. If we wanted to capture more of the fine detail of harmonic structure and know that the sound is stable (like a sustained vowel) then we would prefer a longer window. In practice, since phones are very different in their spectral and temporal characteristics, we usually keep to a default window size and step (e.g. frame size=25 ms, frame step=10ms) and then do some other analysis to obtain the features we want. We’ll come back to this later in the course when we talk about feature inputs for ASR!

Here’s a link to the recording of discussion of this in the in the Q&A session (12/10/2021): Link to recording (Teams)
August 24, 2021 at 10:00 in reply to: Videos not working #13967
Catherine Lai
Professor
The videos are working for me. Could you tell me if there are specific videos that aren’t working for you?
April 14, 2021 at 13:48 in reply to: Problems running testmaker.py #13942
Catherine Lai
Professor
Did you find a solution for this? If not could you ask Jacob Webber (who wrote Qualtreats) directly: j.j.webber@ed.ac.uk? I’m not sure what the expected syntax is for this code.
October 16, 2020 at 15:12 in reply to: Installing anytree urllib3 requests #12513
Catherine Lai
Professor
Since you’re on Edina noteable, I think you just need to install anytree. You should be able to do this using pip instead of conda, i.e. in the notebook

!pip install anytree

You can also try installing urllib3 and requests that way – you should get a message along the lines of ‘Requirement already satisfied…’
October 15, 2020 at 18:00 in reply to: Unable to run some of the cells #12492
Catherine Lai
Professor
You probably also want to check that ffmpeg is actually installed. If you already have a conda enviroment, the easiest way to get it is to run this in the terminal

conda install ffmpeg

You’ll need to have your conda environment active, e.g.

conda activate slp

where slp was the name of the environment you use with the jupyter notebooks.
October 15, 2020 at 17:55 in reply to: Unable to run some of the cells #12491
Catherine Lai
Professor
You’ll need to replace ‘LOCATION_OF_FFMPEG’ in the notebook with an actual path location. To find this, in the terminal try
```
which ffmpeg
```
You’ll probably get something like /usr/bin/ffmpeg, but it could vary depending on how you installed ffmpeg.

Note: you don’t need to put “!” at the beginning of this if you run commands in the terminal. the “!” is just if you want to run bash commands with the Jupyter Notebook.
October 9, 2020 at 18:10 in reply to: Understanding Phase #12311
Catherine Lai
Professor
There’s a bit about phase in the reading for Module 2:
https://www.ee.columbia.edu/~dpwe/pubs/Ellis10-introspeech.pdf

But it might be more practically helpful to look at some animations. There are some nice interactive ones on this website:
https://jackschaedler.github.io/circles-sines-signals/index.html

This one relates sines and cosines to rotations around the unit circle (the ‘phasor’ in the module 1 notebooks
https://jackschaedler.github.io/circles-sines-signals/sincos.html

This one shows what a change in the phase angle means:
https://jackschaedler.github.io/circles-sines-signals/trig_review.html

And here is a visualization of phase with respect to the DFT.
https://jackschaedler.github.io/circles-sines-signals/dotproduct4.html

You can also play with this by changing the params settings in the last exercise of the sp-m1-3-sampling-sinusoids.ipynb notebook and generate some new animations there.

The key point for us is that DFT produces N magnitude and phase outputs.
The phase outputs basically tell you how much you would shift a specific cosine wave (basis function) if you were to try to reconstruct the original input by adding up scaled and shifted versions of those N cosines associated with the N DFT outputs (remembering that only half of those are distinct because of aliasing!).

You can play with how changing the phase of an input component effects the DFT magnitude and phase outputs in sp-m1-5-interpreting-the-dft.ipynb by changing the values in the code under section 5.3 ‘DFT of a compound waveform’ (see the gen_sinusoid function). Basically, the phase angle changes the point on the unit circle where you start sampling your cosine waves.

In terms of visualizations, I also quite like this ‘Lead/Lag’ video from the Khan Academy Electrical Engineering course, though the teacher never actually says phase! The videos that follow in that series also have some nice visualizations of Euler’s formula.
https://www.khanacademy.org/science/electrical-engineering/ee-circuit-analysis-topic/ee-ac-analysis/v/ee-lead-lag?modal=1
October 9, 2020 at 17:10 in reply to: Clarify the Difference between the Filter and Output Response #12307
Catherine Lai
Professor
The frequency response of a filter describes how a filter interacts with input signals in the the frequency domain. We can then talk about the frequency magnitude response as a curve of frequencies (ranging from 0 Hz to the Nyquist frequency) versus magnitudes.

So, you can think of the frequency response of a filter telling you how the applying a filter to an input signal would change the frequency spectrum of the input signal (aka the frequency response of the DFT to the input!). For example, you might design a low pass filter that completely attenuates all frequency components that are greater that 8000 Hz. In theory this filter would have a magnitude response curve that goes to zero for all frequency values above 8000 Hz.

To see that the frequency response of a filter is something different (but closely related) to the magnitude spectrum of a specific waveform, you can think about impulse trains with different fundamental frequencies.

We know that impulse trains with different fundamental frequencies have different harmonics, so their magnitude spectra differ in terms of which frequencies would get non-zero magnitudes. For example, if we had an impulse train with F0=100Hz, only frequencies which were multiples of 100Hz would have positive magnitudes, but if our impulse train had F0=200Hz, only multiples of 200Hz would be positive. However, if we applied the same filter to both impulse trains, both of their magnitude spectra would have the same overall shape (i.e. spectral envelope), which would match the shape of the filter’s frequency magnitude response! (You can use the code in the Module 2 FIR and IRR filter notebooks to check this!)

So, to get down to the terminological question ‘the frequency representation of the filter’ is the same as the ‘frequency response’ of the filter. This determines what the filter will do to the frequency components of the input signal. If we do the DFT of the output of applying the filter to an input in the time domain it will have the same overall shape of the frequency response of the filter, but the actual details of the magnitude spectrum will depend on the frequency components of the input signal.

Side note: it’s good to remember that there are actually two parts of the frequency spectrum: the magnitude spectrum and the phase spectrum. So the frequency response of a filter also includes a separate phase response. Most of the time, for speech technology applications, when we talk about the frequency response of a filter, we’re just talking about the frequency magnitude response but it’s worth noting that a filter can have an effect of phase shift too (see the moving average example in the FIR filter notebook!).
October 9, 2020 at 13:36 in reply to: Why doesn’t DFT[0] tell us anything about the frequency of the input? #12301
Catherine Lai
Professor
Yes, that’s right!

We can interpret this as telling us the bias of the input amplitude in the time domain. That is, when reconstructing the original waveform should we shift all the cosines representing our frequency components up (positive bias), down (negative bias) or keep them centred at zero (zero bias).
Author

Posts

Viewing 15 posts - 46 through 60 (of 74 total)

← 1 2 3 4 5 →

Catherine Lai

Forum Replies Created

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis