in Peter Ladefoged “Elements of acoustic phonetics”, 1996, University of Chicago Press, Chicago, Second edition, ISBN 0226467635, 0226467643
Forum for discussing this reading
› Forums › Readings › Ladefoged – Elements of acoustic phonetics › Ladefoged – Chapter 4
-
AuthorPosts
-
-
August 3, 2016 at 17:16 #4117
Wave analysis
-
October 2, 2017 at 17:17 #7813
On pages 52, it is noted that “a non-repetitive waveform with a rapid rate of decay is represented by a much flatter curve, indicating that it has energy spread over a wider range of frequencies”. I understand why this should be the case, but then on the opposite page, in figure 4.13, wave (d) is just that, rapidly decaying and non-repetitive. Yet the spectrum has the same curve as the spectra for the other waves, which are repetitive. Why is this?
I can see why (d) is not represented with lines since the sound wave should take the sum of an extremely large number of sine waves, but I don’t understand the shape, or envelope here.Attachments:
You must be logged in to view attached files. -
October 2, 2017 at 18:00 #7815
Ladefoged is being a little sloppy with his use of the term “repetitive” because there are several sorts of repetition going on in this figure, including one at 100Hz in 4.31(a) and another at about 700Hz (in all subfigures).
Let’s imagine that 4.13(d) is the impulse response of a vocal tract which has a single resonant frequency at 700Hz – it’s the “ringing” of that vocal tract after a single impulse excitation.
The waveform in Figure 4.13(d) is very similar to a single pitch period of the waveform in 4.13(a). In that case, the waveform in Figure 4.13(a) must be the output of the same vocal tract, but this time excited with a sequence of impulses (i.e., an “impulse train”) at 100Hz.
The reason that the spectrum in 4.13(a) has a line structure (i.e., with harmonics at multiples of a fundamental frequency) is because of the repetitive pattern at 100Hz: it’s the evidence of the excitation signal.
The reason that the spectrum in 4.13(d) does not have a line structure is because only a single pitch period is being analysed, and so there is no periodic excitation – just a single impulse.
Damping
The waveform in Figure 4.13(d) is quite a like sine wave at a frequency of 700Hz, except that is has decaying amplitude. If it was simply a 700Hz sine wave of constant amplitude, then its spectrum would be a single vertical line at 700Hz with “no width”. But the decay means that it’s not quite a sine wave. And “not quite” means something very specific: that it must contain other frequencies. That’s why the spectrum isn’t just a single line, but has a width. This is called the bandwidth and is related to the rate of decay.
The decay is a consequence of a physical process called damping: the vocal tract gradually absorbs energy from the signal and so the signal’s amplitude decays over time. More damping (due to softer, fleshier vocal tract walls!) would make the decay faster.
The take-home message
What we are seeing in 4.13(d) is an important property of a linear system, which we’ll be mentioning in the lecture about TD-PSOLA. The Fourier transform (i.e., spectrum) of the impulse response is exactly the same thing as the frequency response of the system (e.g., the filter).
-
October 5, 2017 at 08:01 #7854
-
October 3, 2019 at 22:18 #9948
Does the fundamental frequency of a complex wave (provided this frequency is present as one of the component waves) necessarily have the largest amplitude on the spectrum?
On page 54, it is noted that ‘the wave at the top left of figure 4.13 has a fundamental frequency of 100 Hz…Within each cycle there are six peaks, corresponding to a wave with six times the fundamental frequency. We may therefore expect the 600 Hz to have a relatively high amplitude.’
I don’t really get why the 600 Hz component should have a high amplitude in this case. Could you please explain it? Cheers!
Attachments:
You must be logged in to view attached files. -
October 4, 2019 at 08:35 #9950
The fundamental frequency of a complex wave does not necessarily have the largest amplitude in the spectrum.
We can use the source-filter model to understand how that is possible, using figure 4.13 from Ladefoged that you attached. These are idealised speech waveforms, made by passing an impulse train through a filter.
The filter is particularly simple in this example: it has a single resonance at 600 Hz.
Energy at or close to the resonant frequency is amplified by the filter, whereas energy at frequencies far away from the resonant frequency is attenuated. To convince yourself that a filter can do that, think about a brass instrument like a trumpet: the input is generated by vibrating lips at the mouthpiece, which is not very loud, yet the output can be very loud indeed.
The input impulse train in Ladefoged’s example has a fundamental frequency of 100Hz in the uppermost plot. This contains equal amounts of energy at 100 Hz, 200 Hz, 300 Hz, 400 Hz, 500 Hz, 600 Hz, 700 Hz, …. etc.
Thinking in the frequency domain will be easier than the time domain. The spectrum of that impulse train tells us that the waveform is equivalent to a sine wave at 100 Hz, added to one at 200 Hz, another one at 300 Hz, and so on.
All that a (linear) filter can do to a sine wave is change its amplitude: increase or decrease it. The amount of increase or decrease plotted against frequency is called the frequency response of the filter. The filter in the example has a peak in its frequency response at 600 Hz, meaning that any input at that frequency (e.g., the 600 Hz sine wave component of the impulse train) will be amplified.
Try this yourself in the lab: take an impulse train and pass it through a filter that has a single resonance (in Praat you can use “filter one formant“), then inspect the waveform and the spectrum. With appropriate filter settings you can almost entirely attenuate the fundamental. But, listen to the resulting signal and you will perceive the same pitch as the original impulse train.
-
October 2, 2020 at 18:36 #12170
My copy (edition 2) seems to have a different Fig4.13 from the attachment above (#9948). It has 4 cycles per 0.02s on the top graph. But the text says it is 100Hz (top p54) and shows ‘a pair of cycles’ (bottom p52). It is a mistake in the book, right?
And I can only see 5 peaks, not 6 as stated at top of p54.
-
October 5, 2020 at 18:28 #12231
Yes, the figure is different in the 2nd edition (the attachment in #7813 is from one version of that edition but this has serious errors in it), in which there are 4 sub-plots all with the same spectral envelope but different F0.
In the top sub-figure, the waveform makes 6 cycles within the first pitch period and that’s what the “6 peaks” is referring to. This is the resonance of the vocal tract – the “ringing” of the filter in response to an input impulse. It corresponds to the peak in the spectral envelope at 600 Hz.
All the waveforms in the other sub-figures have the same ‘ringing’ behaviour, it’s just that the input impulses are spaced at different fundamental periods.
Attached is a correct Figure 4.13 from my hardcopy 2nd edition.
Attachments:
You must be logged in to view attached files.
-
-
AuthorPosts
- You must be logged in to reply to this topic.