Acoustic characteristics of consonants

Consonants are a diverse set of speech sounds ranging from vowel-like approximants to complete closure of the vocal tract with silence.

This video just has a plain transcript, not time-aligned to the videoConsonants are a diverse set of speech sounds ranging from vowel-like approximants to complete closure of the vocal tract with silence. By utilizing the detail available to us in spectrographic displays of speech sounds, we are able to categorize and describe speech with far more accuracy than we could with waveforms alone. This video will present the acoustic characteristics of the parameters that define consonant sounds (voice, place, and manner).
First we’ll begin with voicing. All speech sounds can all be categorized as voiced or voiceless. The presence or absence of voicing is visible in both the waveform and the spectrogram, though it has a few different appearances.
For some sounds, the typical production of some sounds is voiced. Examples of these sounds are vowels, nasals, and approximants. Together we can refer to them as sonorants, although sometimes the vowels are left out of this category. In sonorants, voicing is apparent in the periodic structure of the waveform, vertical striations in the spectrogram, or as clearly defined harmonics in the spectrum. In the waveform of “scan it” for example, we can see that voicing is present from the start of the [a] vowel through the end of the [ɪ] vowel. The particulars of the waveform changes with each phone, but the presence of voicing is continuously evident throughout these three sonorant sounds.
We can also see voicing in the spectrogram and spectrum. In the spectrogram, voicing is apparent as vertical striations at more or less regularly spaced intervals. These striations
may be closer together or farther apart, depending on the fundamental frequency. Close striations indicates a higher F0, and wider striations indicates a lower F0.
If we look at the spectrum, we can see the harmonic structure of the voiced wave. Here again we have an indication of F0. When F0 is low, the harmonics are closely spaced. If the F0 is high, the harmonics will be spread further apart.
Other sounds, such as plosives and fricatives, may be produced either with voicing or without it. These sounds are called obstruents and have a different appearance from the sonorants. Here we have a broadband spectrogram, of a voiced sound between two vowels. In this case, voicing appears in the spectrogram as shading in the low frequency range at the bottom of the spectrogram. This is called the voice bar and is present when the vocal folds are vibrating.
We’ll now move on to manners of articulation, considering their appearance in acoustic representations. We’ll start with plosives, which are perhaps the most straightforward to identify in both the waveform and the spectrogram.
Here we have a waveform and spectrogram of a voiceless plosive. Like all stops, it is made up of two parts: a closure, reflecting the constriction of the vocal tract, and a noise burst reflecting release of that constriction.
Here we can see a region of low energy in the spectrogram. This is an indication of closure and is also visible in the waveform as a region of low or zero amplitude. Here we can also see that this is a voiceless stop due to the lack of voice bar near the end of closure. At the release of the closure we can see a spike in the waveform and a vertical band of energy across the entire frequency range in the spectrogram.
Voiced plosives are similar to voiceless in that they involve a closure of the vocal tract, however, we can see that they are voiced due to the presence of the voicebar during that closure, as well as. We also notice that the burst release tends to be less clearly visible in the spectrogram than it was in the voiceless stop.
Voiceless stops are sometimes aspirated, that is, accompanied by a strong puff of air after the release of the closure. This aspirated release is visible in both the waveform and spectrogram. In aspirated stops, the release portion tends to be longer and stronger than in either voiced or voiceless plosives. The release burst is also accompanied by a bit of turbulent noise -- this is the “aspiration” that we speak of. Here again we see virtually no activity in the waveform or the spectrogram during the stop closure, and a strong burst release accompanied by aspiration.
Fricatives are sounds that are produced with frication, or turbulent airflow. In general, this turbulent airflow generates high frequency noise. Fricatives are often very loud, or high amplitude, and their energy will be dispersed over a broad frequency range in the spectrogram.
Here we can see voiceless labiodental fricative [f], voiceless interdental fricative [θ], voiceless alveolar fricative, and voiceless postalveolar fricative [ʃ] preceding vowels. In each case, we can see some diffuse noise spread across the frequency range, indicating frication. However we can also see that not all fricatives are the same. The amplitude of the noise in [s] and [ʃ] is much higher than that of [f] and [θ]. This makes them more salient to the ear and more easy to hear. In voiced fricatives the high frequency noise is sometimes less apparent and we can also see evidence of voicing as striations in the spectrogram or sometimes periodicity in the waveform.
Nasals are sounds that are produced similar to stops in that the mouth is closed at some place of articulation but air is allowed to pass through and resonate in the nasal cavity at the same time. As voice sounds nasals will feature the striations indicative of vocal fold vibration. Here we can see the vertical lines in the spectrogram indicating voicing. Nasals will also have low energy in the spectrogram compared to the surrounding vowels. Here we see spectrograms of three nasals: bilabial [m] alveolar [n] and velar [ŋ]. In each case we can see that the amplitude drops off sharply after the vowel as soon as the nasal closure begins.
Approximants are sounds that are acoustically very similar to vowels. They are produced with visible voicing and formants in the spectrogram however they typically have lower amplitude than vowels, which is visible in the waveform and in the shading of the spectrogram. They will also be apparent by their continuous transitions from approximant to vowel and may look similar to diphthongs.
Here we have an example of a labiovelar approximant [w], and we can see that the formants start off in a very low position and then transition steeply into the vowel [ɛ]. This steep rise from the approximant into the vowel is quite typical, though sometimes the transition can be more gradual. An example of this is visible on the right where we see the word yell. Here we have a palatal approximant transitioning smoothly from an articulation quite like [i] to that of [ɛ].
The alveolar approximants [l] and [ɹ] are often difficult to identify in a spectrogram. They will be characterized by lower amplitude than the adjacent vowels and most of their energy will be low in the spectrum. That is, in the low frequency range. In some cases there may be an abrupt boundary between the alveolar lateral approximate [l] and the adjacent vowels, much like a nasal, though this is not always reliable. In the alveolar approximant [ɹ] we often see a steep rise of the third formant out of the approximant into the vowel. When the alveolar approximant [ɹ] occurs at the end of a word, we often get something called “r-coloring” which affects the formant structure of the vowel but does not appear as a distinct segment in the spectrogram.
The acoustic signal also gives us clues to the place of articulation of stops and fricatives. In plosives, both the release burst and the vowel format transitions will offer indications of the place of the stop closure. Here we have spectrograms of bilabial alveolar and velar stops. In each case the plosives are followed by the same vowel [a], and we can see changes in the formant structure as a result of the stop closure. In the bilabial plosive the first three formants all start at lower frequencies than would be expected for the vowel quality itself. They will rise out of the stop closure until they reach their steady state.
In the alveolar stop here the second and third formants remain steady. This is a change in structure from the bilabial stop though it is subtle and sometimes may be difficult to spot.
Velar stops are often characterized by format movement that brings the second and third formants together. The second format will be quite high and the third formant may move down to meet it. This formant structure is known as the “velar pinch” and is a dead giveaway for a velar closure if you see it.
We can also distinguish place of articulation in fricatives based on the acoustic information. A common way to do this is to identify where the frication noise is concentrated in the frequency range. The alveolar fricative [s] tends to have energy concentrated between five and ten thousand hertz and have a very high amplitude. The post alveolar fricative [ʃ] will tend to have its energy concentrated between three and five thousand hertz and also have rather high amplitude. In contrast, the labial dental fricative has very weak energy or low amplitude and this energy is centered between three and four thousand hertz. The interdental fricative [θ] similarly has weak energy but its energy concentration is around 8000 hertz. We can also sometimes see vowel formant transitions in relation to fricatives though these may be less reliable than in stops.

Log in if you want to mark this as completed
Excellent 64
Very helpful 20
Quite helpful 15
Slightly helpful 8
Confusing 6
No rating 0
My brain hurts 10
Really quite difficult 14
Getting harder 32
Just right 56
Pretty simple 1
No rating 0