Finish – Page 2

Module 3 gave an introduction to digital signal processing. We hope you can now see the connection between articulatory and acoustic phonetics, and how we might use our knowledge of this to start building computational models for the analysis of speech. Now is a good time to think about what computational models would need to capture in order to characterise different speech sounds, e.g. properties of vowels (like formants), and consonants (like frication or bursts).

The fact that we have to digitize the speech signal and do short term analysis on speech results in specific design decisions, e.g. the sampling rate, window size. We can attempt to get a ‘better’ view of the spectral characteristics of speech by engineering the size and type of windows we use as inputs to the Discrete Fourier Transform (DFT).

Both signal processing and acoustic phonetics are massive fields. We do not except you to show mastery of both in only a few weeks! For the purposes of this course, you should be able to:

Explain how sampling rate determines which frequencies can be captured in a digital speech signal, i.e., What is aliasing? What is the Nyquist Frequency?
Explain how the DFT is used to generate a spectrogram
Describe what the output of the DFT is, what the magnitude and phase spectrums are, and why we only visualize the first half of the magnitude spectrum in a a spectrogram.
Describe how input size and sampling rate determine which frequencies can be analysed in a spectrogram
Describe what spectral leakage is and when it occurs
Describe how window shape can affect the shape of the magnitude spectrum

It’s out of scope for this class, but for real applications we also have to consider how fast our algorithms are. In general, you will be using the Fast Fourier Transform (FFT), an implementation of the DFT that allows us to make use of the certain repetitions/overlaps in how we calculate the separate DFT outputs to get the results faster. The python numpy FFT function is used in the module 3 lab, notebook 1, to show you what to expect if you try an off-the-shelf DFT implementation. The time gains aren’t really noticeable for the small examples used in the the lab notebooks, but when dealing with real world data, the optimizations of the FFT make a big difference. In general, you will see that there are many ways to solve the same problem. We’ll come back to this later in the course, when we look at an algorithmic method called dynamic programming.

What you should know

Some more detailed notes on what’s examinable from this module:

Digital Signals:

Explain how bit depth (i.e. quantisation) effects the quality of a digitized speech signal
Sampling rate: Explain how sampling rate determines which frequencies can be captured in a digital speech signal, i.e. how this relates to:
- Aliasing
- Nyquist Frequency

Short Term analysis: Why do we do short term analysis on speech (i.e. windowing)?

Series expansion, Fourier analysis, frequency domain:

What do we uses the Discrete Fourier Transform for?
- i.e. mapping from the ime domain to Frequency domain
- Interpret the DFT as a series expansion of a complex waveform
Describe what the output of the DFT is:
- If the input is a sequence length N, how many outputs are there?
- What do the magnitude and phase spectrums represent? What’s on the x and y axes?
- Why do we only visualize the first half of the magnitude spectrum in a a spectrogram? (i.e. link to aliasing)
Describe how input size and sampling rate determine which frequencies can be analysed in a spectrogram:
- Calculate what frequencies are represented in the DFT output
- When does spectral leakage occur? (see Lecture, lab notebook)
Describe how window shape can affect the shape of the magnitude spectrum, i.e. why do we want a tapered window?
What’s the relationship between the DFT and what we actually see in a spectrogram?
What’s the difference between a narrow versus a wide band spectrogram