The source-filter model of speech gives us a way of writing down the generation of speech sounds in the vocal tract in mathematical terms. Here we modelled the source as either a series of impulses or as white noise, and the filter in terms of Infinite Impulse Response (IIR) filters.
You will have seen in the labs that interpreting specific IIR filter coefficients in terms of resonances is not at all obvious. We don’t go into the details in this class, but this is a case where viewing what the filter in a transformed representation (in this case, the z-domain) makes things more interpretable. This requires quite a lot more mathematical scaffolding, which is why we don’t go into it here. In general, signal processing is an area with well developed mathematical tools which make determining a filter’s actions on a signal much more transparent. Understanding this can be very important for improving signal quality in different real life situations, but that is beyond the scope of this course.
The important point to take away is that once we know the frequency response of a filter, we can easily determine what it’s effect will be on a signal will be. In the case of an impulse train, the filter’s frequency response (magnitude spectrum) will be superimposed on the frequency response of the impulse train, causing a change in the spectral envelope. The fine detail of the spectrum will be determined by the fundamental the fundamental frequency of the impulse train and integer multiples of that (harmonics).
The idea of extracting well-behaved features that compactly represent the spectral envelope of speech sounds through time will lead us to the concept of Mel-Frequency Cepstral Coefficients in Module 8. These sorts of features are the backbone of most Automatic Speech Recognition Systems and statistical Text-to-Speech systems. In the next weeks, we’ll turn our attention to concatenative speech synthesis, where we’ll look at generating speech from text inputs based on the acoustic properties of speech units in a speech database.
What you should know
Spectral envelope, Resonance, Vocal Tract Resonance:
- Describe how we can approximate the vocal tract in terms of a series of tubes (i.e. physical source filter model
- What’s the source? How does it relate to F0
- What determines the filter properties? How does this relate to resonance?
- You won’t be expected to derive any actual tube models for specific vowels or do resonance calculations.
Harmonics, Impulse train:
- Explain what an impulse and impulse trains are
- Explain how we can determine/change what fundamental frequency and harmonics of an impulse train will be
Filter, Impulse response, Source-Filter model
- Describe the difference between a Finite Impulse Response (FIR) and an Infinite Impulse Response (IIR) filter
- Describe the relationship between resonances, the spectral envelope of a sound, and the frequency response of a filter
- Explain what represents the vocal source in the source-filter model, and how this can be varied to represent different classes of speech sounds
- Explain how the vocal tract resonances are represented in the source-filter model, and how these can be varied, and how these relate to our perception of speech sounds, e.g. in terms of formant structure
- Describe the difference between low-pass, band-pass, and high-pass filters in frequency domain terms.
Extensions
The version of the source-filter model presented in this course works pretty well for a broad characterisation of some phones as presented, but you’d be well justified in wondering whether it’s an oversimplification of the vocal tract (e.g. where’s the nasal cavity in this model?). A classic text on acoustic phonetics is Ken Stevens’ Acoustic Phonetics. This is well out of scope for this course, but a quick browse of chapter 3 (‘Basic Acoustics of Vocal Tract Resonators’) which give you an idea of the complexity involved in modelling more detail in speech production.
If you want to try to build your own physical tube model, Mark Huckvale has some practical instructions here.
If you’re interested in learning more about filters or signal processing in general, I recommend Rick Lyon’s Understanding Digital Signal Processing as the most accessible yet mathematically clear textbook on this area I’ve come across (unfortunately, the University of Edinburgh library only has paper copies on short term loan). Previous SLP students have also recommended The Scientist and Engineer’s Guide to Digital Signal Processing by Steven Smith, which available for free online. Again, going further into Digital Signal Processing is out of scope for this course, but you may well find a bit of extension useful if you intend to pursue this line of study in the future.
Key Terms
- Spectral envelope
- Resonance
- Vocal Tract
- Formant
- Source
- Filter
- Difference equation
- Impulse, Impulse Train
- Impulse response
- Finite Impulse Response
- Infinite Impulse Response
- low-pass, high-pass, band-pass