Module 1 – Phonetics and Representations of Speech

An introduction to phonetics and how we can visualise speech
Log in

Welcome to Speech Processing!

In this first module/week, we will first give an overview of the course with a view to establishing the relevance of phonetics to speech technology (i.e., text-to-speech and automatic speech recognition). We’ll start to touch upon the following foundational questions in spoken language processing: What is text? What does it represent? How can you describe speech to a computer? How does that relate to phonetics?

After the course overview, we will start to make the connection between text and speech by looking at some visual representations of speech and relating them to the articulatory changes that take place in your mouth to create various speech sounds. We’ll also begin working with the speech annotation software Praat to annotate and analyse speech sound waves. We’ll briefly introduce the IPA and the concepts that relate the grid structure of the chart to the anatomical structures of human vocal tracts.

Since people are still choosing courses, we won’t assume that you will have watched this week’s videos before the Thursday lecture. But if you are certain you are taking this course, we may want to get ahead on that (and on next week’s content).

Please note there is no lab in week 1! The first lab session will be in week 2 and will cover material from module 1. In general labs for a module are in the week after the lecture.

Lecture Slides

Lecture 1 slides (google slides) [updated 19/9/2023]

Lecture 1 slides (pdf)

Click on the pop-out icon (red square with arrow in top right corner for each video) to view the video with speed controls and transcript.

Total video to watch in this section: 21 minutes

We use a lot more than just our mouth to produce speech

This video just has a plain transcript, not time-aligned to the videoAlthough we think about using our mouths when we talk, speech production requires the use of anatomical structures from the diaphragm through the nose. Taken together, all of the anatomy that is required for speech is known as the vocal tract.
At the center of the vocal tract is the larynx, commonly known as the “voice box”. The larynx sits in the front center the neck and is the source of the vibrations that are known as the voice.
The anatomical structures below the larynx make up the sublaryngeal vocal tract and includes the lungs, diaphragm and trachea, while everything above the larynx is known as the supralaryngeal vocal tract.
The rest of this video will focus on the names of the anatomical structures and articulators above the larynx.
The supralaryngeal vocal tract can be divided into two main regions, the oral cavity and the nasal cavity. The oral tract is where most of the articulatory action takes place.
The articulators in the supralaryngeal vocal tract are the anatomical structures that shape and influence the quality of sound that emerges from the mouth during speech.
These articulators can be divided into two major types: Active and Passive.
The active articulators are those that move during speech production including the lips, tongue, epiglottis, velum and larynx.
In most speech sounds, the tongue is the active articulator. Because it is so mobile and active during speech, it is further subdivided into regions that can move somewhat independently of each other: the tip, blade, body, and root.
The Passive articulators remain stationary during speech production. These are the teeth, the alveolar ridge, the hard palate, the soft palate (or velum), the uvula, and the pharyngeal wall.
In some cases, certain articulators are classified as both passive and active, depending on how they are used for a given sound.

Log in if you want to mark this as completed
Voice, place, manner

This video just has a plain transcript, not time-aligned to the videoConsonants are speech sounds that are made with some degree of constriction in the vocal tract.
Phoneticians define consonants according to three articulatory dimensions:
Voice,
Place, and
Manner
This video will introduce you to each of these terms and provide some examples of sounds that differ along each dimension.
VOICE
The first parameter, voicing, is presented as a binary option in the IPA chart. It is either on or it’s off; voiced or voiceless. If a sound is voiced, that means that it is produced with vibrations of the vocal folds. A voiceless sound, on the other hand, will not be accompanied by such vibration.
Here is an example of a pair of sounds you are probably familiar with that differ only in voice: [s]/[z] from seeing speech
It is difficult to see voicing differences in the video because the vocal folds are very small and hidden away inside the larynx. You can experiment with voicing in your own vocal tract in the following way:
Place your hand on your throat and make a prolonged [ssssssss] sound.
Now, keeping your hand on your throat, switch to a [zzzzzzzz] sound.
(!) If you’ve never done that before, you might be a bit surprised at what you just felt. That buzz in your throat is your vocal folds giving you a voice.
The IPA only distinguishes between two voicing modes in the main consonant chart, but phoneticians recognize at least 2 further modes of phonation:
Breathy
Creak
In order to indicate Breathy voice or Creak the IPA provides diacritics that can be applied to a symbol to modify its usual interpretation. (Here, a voiced b becomes breathy voiced b, and a vowel with regular phonation becomes creaky voiced just with the addition of the diacritic.)
PLACE
The next parameter is place of articulation, often referred to simply as “place” for short. Place of articulation tells us where in the vocal tract the constriction is taking place, and is named for the passive articulator at that place.
For example, you can make a closure between your tongue and your upper teeth, or the alveolar ridge behind your teeth, or the hard palate, or the soft palate, just to name a few.
Places of articulation are given adjectival names which are taken from the nouns that name the articulator. So if the place of articulation is at the lips, they would be called labial sounds. A closure at the alveolar ridge would be called alveolar, and a closure at the velum or soft palate, would be called velar.
Knowing the names of anatomical structures in the vocal tract will help you to remember the names of places of articulation as well.
MANNER
The third parameter for describing consonant articulation is manner, which describes how much constriction is present in the vocal tract during production of that consonant. The IPA lists 8 manners of articulation in the main consonant chart, ranging from complete closure to mere approximation.
In most cases, the tongue is the active articulator whose actions are being described by the manner of articulation, though there are some exceptions to this.
A complete closure of the vocal tract is known as a plosive or stop because the closure completely stops the airflow.
It is also possible to have a complete closure in the mouth, while the velum is lowered to allow air to pass through the nasal cavity. These sounds are called nasals or sometimes nasal stops.
Another manner that is similar to a stop is a tap or flap. In this case, the active articulator, usually the tongue, makes a quick contact at the place of articulation without maintaining the closure for any appreciable period of time.
A trill is similar to a tap in that it involves quick contact with the place of articulation, however, the contact repeated many times in rapid succession as the result of the articulator vibrating in the force of the airstream, much like a flag waving in the wind.
Another way to constrict the airflow in the vocal tract is to create a narrow passage between active and passive articulators. By forcing air through this passage, a noisy, turbulent, hissing sound is produced that is known as a fricative.
In most cases, the narrow passage that is formed for fricatives is located along the center line of the mouth and tongue, but the tongue can also be held with a closure at the alveolar ridge and passages along one or both sides of the tongue. Such sounds are called lateral fricatives because they are produced with airflow out of the sides rather than down the middle of the vocal tract.
Finally, the most open consonant articulation is known as an approximant, because the articulators do not make direct contact with one another, nor do they cause turbulent airflow. As with fricatives, approximants may be central with airflow proceeding out through the center of the mouth, or lateral with the airflow passing out around the sides of the tongue.
NAMING CONSONANTS
The three parameters voice, place and manner also provide us with a means of naming consonant sounds.
Each of the parameters is independent of the others, so in principle, you can combine them in any permutation you like. However, as we will see when we look at the IPA consonant chart in detail, not all combinations of these parameters are useful for describing the sounds of actual speech.

Log in if you want to mark this as completed
Cardinal vowels, height, advancement, rounding

This video just has a plain transcript, not time-aligned to the videoVowels are speech sounds that are produced with an open vocal tract, meaning that the articulators (that is the lips, teeth and tongue) do not obstruct airflow coming out of the mouth in any way. This video describes vowel production with a focus on the vowel qualities that are represented by symbols in the IPA chart.
While the positions of the lips and tongue have an effect on the vowels that are produced, the vowels of the IPA are also defined in part by the auditory impressions, and their qualities relative to one another.
The vowels of the IPA are therefore abstract and do not represent the vowels of any particular language. Instead, they are intended to serve as reference points that we can use to describe vowels in actual languages.
The articulatory basis of vowel sounds relates primarily to extreme tongue positions. These tongue positions were first described by Daniel Jones using x-ray and lead chains placed along the surface of his tongue – not a method that would be allowed today!
Here we can see four vowels that were documented using this method. In each of these images, the upper and lower jaw is visible, as well as the lead chain, which appears as a dotted line following the curve of the tongue.
In the upper left, we can see that the body of the tongue is raised toward the hard palate, and the highest part of the tongue is relatively far forward. This is a production of the cardinal vowel [i].
In the upper right, the tongue dorsum is raised toward the soft palate, with the highest point being relatively far back. This is a production of the cardinal vowel [u].
The lower right image shows a vowel where the dorsum of the tongue is very low and pulled back toward the throat. This is a production of the cardinal vowel [ɑ].
Finally, the lower left image shows a vowel where the tongue body is pulled down away from the palate, with the highest point of the tongue more forward than the [ɑ] vowel. This is a production of the cardinal vowel [a].
If we compare these tongue shapes to each other in a single image, the relative positions of the tongue body become more obvious. These positions represent extreme articulations, which we can therefore use to define an abstract vowel space that encompasses all possible vowel sounds.
This space is represented in the IPA as a stylized quadrilateral with the four corners defined by the vowels [i], [a], [ɑ], and [u].
The following MRI videos show an interior view of the vocal tract during the production of these four vowels.
The cardinal vowel space is then divided into four evenly spaced vowel heights by placing two vowel qualities between the corner vowels from high to low. Together, the resulting 8 vowels are known as the primary cardinal vowels.
The following clips provide a demonstration of all 8 primary cardinal vowels. Was you watch take notice of the position of the tongue during production of the vowel. Compare it to the previous production and those that come after it.
One further dimension of vowel quality remains to be described – that of lip rounding. The lips and tongue function independently of each other to affect vowel quality. The IPA vowel chart captures this by providing symbols for rounded and unrounded vowels at all combinations of height and advancement.
The set of secondary cardinal vowels differs from the primary set only in the parameter of lip rounding. For the most part this means that front vowels in the secondary set will be rounded, while back vowels will be unrounded. However, the low back vowel is an exception to this pattern where the rounded vowel occurs in the secondary set, and the unrounded vowel in the primary set.
As you listen to the following pairs of vowels, notice how the lips of the speaker protrude as she produces a rounded vowel, and remain relatively neutral when she produces and unrounded vowel. Try to see if you can make vowels that differ only in lip rounding with the same height and advancement.
Two further vowels at the top of the chart complete the cardinal vowel set. These are the high central rounded and unrounded vowels.
The remainder of the vowel chart represents vowel qualities that are more or less intermediate between the cardinal vowels that have been established. These include central vowels at the close-mid and open-mid heights, vowels at the near-close height, front and back, one at the near-open at the front, and schwa at the center. In each case the same principle of rounding and unrounding holds. Specific vowels of specific languages are often transcribed using the nearest IPA vowel symbol, though these may also be modified for more precision with diacritics

Log in if you want to mark this as completed
A set of symbols with which any language can be transcribed.

This video just has a plain transcript, not time-aligned to the video1. Full IPA chart
a. The International Phonetic Alphabet is a tool designed to allow for the transcription of all languages using a standard set of phonetic symbols.
b. The chart can be divided into 4 main sections:
i. Consonants,
ii. Vowels,
iii. Diacritics, and
iv. Suprasegmentals
This video will focus specifically on, first, the structure of the pulmonic consonant chart, and, second, the structure of the vowel quadrilateral. Understanding the structure of both charts will allow you identify and select phonetic symbols based on consonant articulation and vowel quality, and to describe sounds that are represented by those phonetic symbols.
2. Refresher
a. The pulmonic consonant chart appears at the top of the IPA, and represents speech sounds that are made by the usual process of pushing air out of lungs and through the vocal tract. (note: see articulatory videos for more)
b. Here, we’re going to focus specifically on understanding how the chart represents the articulatory parameters that we use to describe consonant sounds, namely Voice, Place and Manner
3. Pulmonic Consonants
a. Symbols in the pulmonic consonant chart are arranged in a grid layout.
b. The columns of the grid represent places of articulation. These places are arranged to correspond with the relative positions of the articulators in the vocal tract, starting with the lips on the left and moving steadily backward through the vocal tract until we finally reach the glottis over here on the right.
c. The rows represent manners of articulation. These manners are arranged in order from complete closure at the top to a more open stricture (known as approximation) at the bottom. (note: see video on articulation for more details about place and manner)
d. The third parameter of voicing is represented within each cell of the grid. Where symbols appear in pairs within a cell, the symbol to the right is voiced, while the symbol on the left is voiceless.
e. In addition to these three articulatory parameters, the consonant chart also provides a bit more information about possible and impossible sounds based on human vocal anatomy.
f. A number of cells in the grid are shaded gray to indicate that these sounds are impossible to articulate. For example, a voiced glottal plosive [gesture with mouse?] is not possible because it would require the glottis to be closed, and therefore motionless, while also allowing air to flow through in order to produced voicing. So, cells that are shaded gray indicate sounds that not possible and will not appear in any language.
g. However, there are also a number of cells that do not contain any symbols, yet are not shaded gray. These cells have been left blank to indicate that, although it is possible for a human vocal tract to make such a sound, we do not currently know of any language that uses such sounds to convey a phonological contrast. Perhaps someday in the future, with further documentation of understudied languages, symbols will need to be developed to fill some of these cells.
4. Other symbols
a. In addition to the main consonant chart, there is also a section of Other Symbols located a bit further down the IPA.
b. This section presents symbols for pulmonic sounds that do not fit into the main chart for one of two reasons. First, we have sounds that are produced with a place or manner of articulation that does not appear in the pulmonic chart, mainly for reasons of space efficiency and tidy formatting. For example, here we see some epiglottal sounds, and one produced in the manner of lateral flap.
c. Second, we have sounds that are produced with two simultaneous constrictions at different places of articulation, which we refer to as doubly articulated consonants. This includes affricates, for example.
d. If we take another look at the pulmonic consonant chart, we can see where the additional places and manners could be added to the chart, but it is harder to imagine how doubly articulated consonants could be included. For example, should the labio-velar approximant [w] be added to the bilabial column, or to the velar column? Or both?
5. Vowels
a. Similar to consonants, vowels in the IPA are described according to three parameters: Height, Advancement and Rounding
b. Although the vowel chart is a quadrilateral (rather than a rectangle) we can see that it is still arranged roughly in columns, rows, and symbol pairs to represent each of these parameters.
c. Height is represented vertically in the vowel chart. The IPA uses the terms “close” and “open” to refer to vowel height, but many sources will use the terms “high” and “low” to mean the same thing.
d. Advancement is represented horizontally in the vowel chart. Just like the consonant chart, the front of the vocal tract appears on the left side here, with the back of the mouth represented on the right.
e. Rounding is represented by the position of symbols within a vowel pair. Symbols on the right are rounded, while symbols on the left are unrounded.
6. An important thing to remember about the vowel chart is that it is not intended to represent actual vowel contrasts in any language. Instead, it is based on extreme vowel articulations as the basis for the description of vowel sounds relative to those reference points.

Log in if you want to mark this as completed

Since this is the first module, here’s a reminder that the readings (or sometimes other media) in each module are categorised as

  • Essential (read all of these)
  • Recommended (read if you want to go deeper)
  • Extra (only read if you’re interested/have previous background; these readings may be challenging and can be considered beyond the scope of the course)

Only material in the essential readings is directly examinable, but the other readings may help you make the connections that get higher marks.

Reading

Practical Phonetics

Videos for the course Practical Phonetics

Module 1 labs (to be held in week 2, 2023-24) are cancelled due to the UCU strike.

Exploring Speech Acoustics

This is your first PHON lab! This lab will use Praat to explore speech acoustics through visualisations.

Follow the links to find:

Remember, the labs for each module are in the week following the lecture. So, this module’s lab will be on Wednesday in week 2 (there’s no lab in week 1).

Maths refresher

If you’re already quite familiar with Praat and phonetics, but not so much with maths, now might be a good time to check out some of Sharon Goldwater’s maths tutorials here:
https://homepages.inf.ed.ac.uk/sgwater/math_tutorials.html

If you’re looking for some a more comprehensive textbook that goes back to basics, you might find Foundation Maths by Croft and Davidson helpful (available through the University of Edinburgh library).

For the signal processing bit of the course (coming up next), you might also want to brush up on some trigonometry and vectors (the trig stuff is mostly relevant for the signal processing modules, the vectors/linear algebra is relevant to machine learning more generally):

There’s more maths prep resources here: https://speech.zone/courses/prepare-for-study-in-speech-and-language-processing/brush-up-your-mathematics/

Or you could just get ahead with the assigned reading!

Lab Commentary

This lab was about exploring speech representations in Praat, so the “solutions” are more of a commentary.

Rebekka’s video commentary for the Module 1 lab

 

Once you’ve attended the first lecture, it’s time to start watching the associated videos on phonetics and doing the readings listed in module 2. If you’re not enrolled and you want to take this course, please make sure that you enrol as soon as possible!

Remember: The lab for module 1 will be held in week 2. 

What you should know from Module 1

Students often ask if they need to memorise the IPA chart and all the physiological terms and symbols mentioned in this module.  The answer is no!  The main skill you need is to be able to read and interpret the IPA Chart.  For this course, you’ll always be able to refer to IPA chart in the tests (as in most real life IPA related scenarios!). 

We don’t go that deep into articulatory phonetics in this course, but you should make sure that you have an understanding of the foundational concepts:

  • Basic vocal anatomy:
    • What are active versus passive articulators
    • Systems in Speech Production: the respiratory system, the phonation system, and the articulation
      system 
  • Consonants:
    • Voicing: what the difference between voiced versus unvoiced sounds
    • Place: You should be able to relate place columns in the IPA chart to points of constriction in
      the voice tract.
    • Manner: You should know the general differences between stops, taps, trill, fricative, lateral fricatives, approximants in terms of how they are articulated.
    • Why are pulmonic and non-pulmonic consonants listed separately on the IPA chart?
  • Vowels:
    • Height
    • Frontness
    • rounding
    • What are the cardinal vowels? How do they relate to the representation of vowels on the IPA chart?
    • What’s the difference between monophthongs and diphthongs
    • What’s the difference between oral vs nasal vowels
You can do the exercises at the end of Wayland Chapter 1 if you want some more practice.

Key Terms

  • Articulation
  • Articulators
  • Manner (of articulation)
  • Place (of articulation)
  • Voicing
  • Consonant
  • Vowel
  • Vowel quality
  • Phonetic
  • Acoustic
  • Phone
  • Contrast

You may find some helpful discussion on the topics covered in this module on the Foundations of Speech part of the speech.zone forum.