Sound source

Air flow from the lungs is the power source for generating a basic source of sound either using the vocal folds or at a constriction made anywhere in the vocal tract.

slownormalfast

This video just has a plain transcript, not time-aligned to the videoWe've seen speech already in the time domain by looking at the waveform.
But how is that speech made?
Well, we need some basic sound source and some way to modify that basic sound source.
The modification, for example, might make one vowel sound different from another vowel sound.
Here we're just going to look at the source of sound, and we'll see two possible sources that can make speech.
Here's someone talking.
He has a vocal tract; that also happens to be useful for breathing and eating, but here we're talking about speaking.
That's just a tube.
For our purposes, it doesn't matter that that's curved.
That's just to fit in our body.
We can think of it as a simple tube, like this.
So here it is, a simplified vocal tract.
At the top here, the lips; at the bottom, the lungs.
The lungs are going to power our sound source.
Airflow from the lungs comes into the vocal tract.
We can block the flow of air with a special piece of our anatomy called the vocal folds.
There they are.
As air keeps flowing from the lungs, the pressure will increase below the vocal folds.
We will get more and more air molecules packed into this tight space.
More tightly packed molecules in the same volume means an increase in pressure.
That's what pressure is: it's the force molecules exert on each other and on their container.
Eventually, the pressure is enough to force its way through the blockage, and the vocal folds burst open.
The higher pressure air from below moves up.
So we get a pulse of higher pressure air bursting through the vocal folds.
That releases the pressure below the vocal folds and they will close again.
Now have a situation where there is a small region of higher pressure air just here, surrounded by lower pressure air everywhere else.
That's obviously not a stable situation.
This higher pressure air exerts a force on the neighbouring air and a wave of pressure moves up through the vocal tract.
It's important to understand that this wave of pressure is moving at the speed of sound, and that's quite different from the gentle air flow from your lungs: your breathing out.
You don't breathe out at the speed of sound!
Breathing is just the power source for the vocal folds.
The air flow in the vocal tract is much, much slower than the propagation of the pressure wave.
So we can neglect the airflow and just think about this pressure wave moving through air.
A pulse of high pressure has just been released by the vocal folds.
Let's make a measurement of that.
Imagine we could put a microphone just above the vocal folds and measure the pressure there.
The plot might look something like this: an increase in pressure as the pulse escapes, a dip as the pulse moves away, and then a gradual settling back to the ambient pressure.
We've created sound!
Sound is a variation in the pressure of air.
Let's listen to that one pulse - that glottal pulse.
Listen carefully because it's going to be very short.
Just sounds like a click.
Let's do that again.
That's the sound of a glottal pulse created in the glottis.
The glottis is a funny thing.
It's the anatomical name for the gap between the vocal folds.
Of course, if the lungs keep pushing air, the pressure will build up again.
After some short period of time, the vocal folds will burst open again and we'll get another pulse.
That will repeat for as long as the air is being pushed by the lungs.
Remember the lungs are the power source of the system
The actual signal will be a repeating sequence of pulses.
I'm going to play this pulse now, not in isolation, but I'm going to play it 100 times per second.
It sounds like this.
Well, it's not speech, but it's a start.
For our purposes, which eventually are going to be to build a model of speech production that we can use various things, the actual shape of the pulse turns out to be not very important.
Let's try simplifying that down to the simplest possible pulse.
That's this signal here, that is zero everywhere and goes up to a maximum value instantaneously and then back down again.
Let's listen to that.
Again, listen carefully.
It sounds pretty similar to the other pulse, just like a click.
We can play a rapid succession of such clicks.
Let's start with a very slow rate of just 10 per second.
Perceptually that's still just a sequence of individual clicks, so I'll increase the rate now to 40 per second.
I can't quite make out individual clicks now.
It's starting to sound like a continuous sound.
If we go up to 100 per second, it's definitely a continuous buzzing sound.
So, although we're talking about speech production here, we've learned something interesting about speech perception already: that once the rate of these pulses is high enough, we no longer hear individual clicks but we integrate that into a continuous sound.
This pulse train signal is going to be a key building block for us.
It's going to be initially just for understanding speech.
That's what we're doing at the moment.
We're going to use it later actually, as the starting point for generating synthetic speech.
There are other sources of sound.
We will just cover the second most important one, after voicing.
Again, here airflow from the lungs is the power source.
But this time, instead of completely blocking the flow at the vocal folds (which are at the bottom of the vocal tract) we'll force the airflow through a narrow gap somewhere in the vocal tract.
So let's make that constriction.
Air flows up from the lungs, and it's forced through this narrow gap.
When we force air through a narrow gap, it becomes turbulent.
The airflow becomes chaotic and random, and that means that the air pressure is varying chaotically and randomly.
And since sound is nothing more than pressure variation, that means we've generated sound!
So again, if we put a microphone just after that construction and recorded that signal created by that chaotic, turbulent airflow, it looks something like this: random and without any discernible structure.
Certainly no repeating pattern.
That signal would sound like this.
It's noise.
Why don't you try making a narrow construction somewhere in your vocal tract and push air through it from your lungs? I wonder how many different sounds you could make that way.
You can change the sound by putting the constriction in a different place.
I'll give you a few to start with.
I'm sure you can come with many more.
These then are the two principal sources of sound in speech.
On the left, voicing.
That's the regular vibration of the vocal folds.
On the right, frication, which is the sound caused by turbulent airflow at a narrow constriction somewhere in the vocal tract.
There are few other ways of making sound, but we don't really need them at this point.
These are going to be enough for a model of speech that will be able to generate any speech sound.
We saw everything in the time domain here.
We plotted lots of waveforms.
We've been talking about the sound source, and we now know why a speech waveform sometimes has a repeating pattern.
It's because the sound source itself was repeating.
We call such signals 'periodic', and you'll find that whenever there's a repeating pattern in the waveform, that can only be caused by voicing: the periodic vibration of the vocal folds.
Whenever there is voicing, you will also perceive a pitch.
Perhaps you could call that a musical note or a tone.
Pitch is controlled by the speaker's rate of vibration of the vocal folds.
So we could use pitch to help convey a message in speech.