Prosodic Structure

Prosody is the combination of speech properties that break speech into units of time, indicate the boundaries of those units, and highlight certain constituents.

This video just has a plain transcript, not time-aligned to the videoThis video will introduce the notion of prosodic structure and relate it to the acoustic phonetic dimensions that we have been learning about throughout the course.
Prosody is the combination of speech properties that break up speech into units of time (phrases, sentences, paragraphs), indicate the boundaries of those units (into statements, questions, internal or terminal phrases), and highlight or emphasize certain constituents within that domain.
Prosody is often portrayed as the rhythm and melody of speech.
These aspects of linguistic structure are conveyed using various combinations of duration, fundamental frequency, and intensity. That is, we can measure acoustic characteristics of spoken language, and use that phonetic detail to describe the hierarchical structure that we have traditionally observed impressionistically. These acoustic dimensions contribute in various ways to the prosodic structure of an utterance.
As we might expect, phonetic duration is very important to descriptions of time units in language, but it can also contribute to phrase boundaries and emphasis. Similarly, speakers use fundamental frequency and intensity to indicate the prosodic structure of phrases, as well as to emphasize one or more constituents within those phrases.
The remainder of this video will focus on phrasing and prominence, and the ways we can describe them using acoustics.
The following slides will illustrate some of the ways that acoustics can reveal constituent structure in spoken language. In particular, we’ll see examples of how duration of various elements, movements in the F0 pitch trace and glottalization can indicate the location of phrase boundaries in English.
Let’s start with some examples of the ways phonetic output can vary with changes in constituent structure.
Here I have an example of a string of words that can be grouped into phrases to form a sentence in (at least) two ways. The first way is to make a phrase of the words “when you make hollandaise slowly”, while the second way groups the word ”slowly” with the phrase that comes after it. Listen to these two sentences, and begin to think about how you measure the differences between them using the acoustic tools we have at our disposal.
1. [When you make hollandaise slowly,] it curdles.
2. [When you make hollandaise,] slowly it curdles.
One thing you might notice here, is that there is an appreciable interval of silence at the end of the phrase in each case. In the first example, the pause occurs after the word slowly, because the phrase ends after slowly. In the second example, the pause occurs before the word slowly, again because that is where the phrase ends. So, we can see that pauses, or silent intervals can be an acoustic indication of phrase boundaries.
Now let’s look a bit more closely at the speech inside these phrases. Let’s consider just the word hollandaise in both sentences. In the first example, hollandaise appears within a phrase, while in the second example, it appears at the end of a phrase. This difference in phrasal position is reflected in the acoustic duration of the final vowel in the word.
In example 1, the final [ei] vowel is 117 ms long, while in the second example, the vowel is 234 ms -- nearly 120 ms longer than the first instance. This phenomenon is known as final lengthening and affects words that appear at the end of phrases, especially before a pause.
1. [When you make hollandaise slowly,] it curdles. – [ei] in hollandaise = 117 ms
2. [When you make hollandaise,] slowly it curdles. – [ei] in hollandaise = 234 ms
Another acoustic indicator of phrase boundaries is a sudden or drastic change in F0. The spectrograms shown here now include a line indicating the fundamental frequency aligned with the speech output. In each of these sentences, the f0 drops at the end of the phrase before starting to rise up again to start the next phrase. In the first sentence, this drop in f0 is aligned with the end of the word slowly, while in the second sentence, it drops off at the end of hollandaise. In each case, sudden drop in f0 is aligned with the phrase boundary.
Finally, let’s look at the ends of both sentences. In this case, I am more interested in what is the same in the acoustic outputs than what is different. In both of the examples here, the sentences end with it curdles. The spectrograms shown here are limited to the word curdles only. Unsurprisingly, there are many similarities between these two utterances. Not only are the same words being spoken, meaning that we should expect the phones to be roughly the same, but they are also occupying more or less the same position in their respective phrases.
Notice that the second vocalic interval in each case is produced with glottalization, or a slowing of the vocal fold vibrations accompanied by irregularity of the wave cycle.
Now compare this phrase final utterance of curdles (on the left), to a production that appears at the beginning of a phrase (on the right). Notice that when the word appears near the beginning of a phrase, instead of at the end, there is no glottalization n in second half of the word. This is because glottalization is another cue to a terminal phrase boundary, indicating the end of an utterance.
In the preceding slides we have seen examples of pauses, final lengthening, movement in F0, and glottalization can all indicate the end of a spoken phrase. It’s important to note that although these cues can indicate phrase boundaries, they may not always be present in all cases. It is also possible for these acoustic phenomena to indicate something other than a phrase boundary, such as glottalization as an allophone of /t/.
Now let’s consider the acoustic correlates the second function of prosody: to make words more prominent. Prominence in language is sometimes also referred to as stress or emphasis.
Here we have two sentences that differ only in the word that is the most prominent. As a result, the meanings are quite different. The sentence in parentheses indicates the what is being contrasted. In the first sentence the emphasis is on the word “A”: She didn’t earn an A.
First let’s compare the prominent “A” in the first sentence, with the non-prominent “A” in the second sentence. We saw in the previous examples, that sounds are lengthened at the ends of phrases, and we see the same here. Both instances are quite long at around 400 ms. But although they are similar in duration, they are also different in a number of respects. On the left, the phonation is regular throughout, while on the right, the vowel becomes creaky toward the end. On the left, the pitch trace shows a rise-fall-rise pattern in f0 , while on the right, the pitch is level and then drops off.
In the second sentence, the emphasis is on the word ”earn”: She didn’t earn an A.
Again, we see that the emphasized production of earn has longer duration than the unemphasized version, and a rise in fo.
Prominence and constituent structure together are called Prosodic structure and may reflect relative predictability of elements in speech. Predictability is an indication of how easily a word can be guessed given its linguistic and real-world context and is affected by word frequency, in overall usage an in specific contexts. In both cases, the more frequent a word is, the more predictable it is.
For example, consider the following sentence with a word missing:
The children went outside to ______.
There are many words that could fill in the blank, but the word play is more predictable than the word bark, due to the relative frequency of each of these words overall and in this particular lexical context.
The more predictable a word is, the less likely it is to bear prominence in a sentence, and the less acoustic information is needed for it to be understood. For example, a phrase like I don’t know is very frequent and so requires relatively little acoustic information to be understood. In fact, it can even be understood with nothing but an intonational contour.

Log in if you want to mark this as completed
Excellent 33
Very helpful 5
Quite helpful 2
Slightly helpful 3
Confusing 1
No rating 0
My brain hurts 0
Really quite difficult 2
Getting harder 3
Just right 37
Pretty simple 2
No rating 0