› Forums › Speech Processing – Live Q&A Sessions › Module 6 – Clarification for pitch period, impulse response, fundamental period
- This topic has 2 replies, 2 voices, and was last updated 1 year, 1 month ago by Simon.
-
AuthorPosts
-
-
October 31, 2023 at 14:13 #17055
Hello, I’m confused by the “difference” between a pitch period and a impulse response, I will describe my understanding of them below. Thank you very much for confirming or correcting my understanding.
1. In Module 4 video: “Impulse response”, at 2:35, it is given the left graph of “Module 4-1.png” is a fundamental period of the output waveform of the filter, and it is an “impulse response”.
2. However, in module 4, video “Source-filter model”, the output speech signal is described as a sequence of overlaid impulse responses, does this mean that the previous graph only shows a complete fundamental period, but not a complete impulse response? In order words, can we say that: on the waveform of a speech signal, each complete fundamental period corresponds to 1 incomplete impulse response(because the impulse responses are overlapped), and
the duration of an impulse response on the waveform = a complete fundamental period < the duration of a complete impulse response3. In module 4, video “Source-filter model”, at 5:05, it is given that “The impulse response of the vocal tract filter is given a special name. We call it a ‘pitch period’.”
Based on this sentence, what I think is: a complete impulse response = a pitch period, so the duration of a complete impulse response = the duration of a pitch period.4. The duration between each 2 pitch marks is an approximation of T0.
5. For Module 6, video “Pitch period”, I also want to confirm that “2 x T0” is just the duration of an analysis frame used to extract a pitch period, but not the duration of a real pitch period. Therefore, the waveform fragment inside a triangular window(analysis frame) is not a “real” pitch period, it is just an approximation, and the real pitch period looks more like the waveform between each 2 pitch marks(though they don’t capture a complete pitch period due to overlapping).
Attachments:
You must be logged in to view attached files. -
October 31, 2023 at 16:55 #17063
A pitch period is the period in-between two glottal pulses. Its duration is denoted T0 (measured in seconds). The term `pitch period’ is used to refer to both the duration (T0) and to the speech waveform itself.
The term ‘fundamental period’ is another way of saying ‘pitch period’ (and is more technically correct, of course, because ‘fundamental frequency’ is more correct than ‘pitch’ when talking about a speech waveform.
A ‘pitch mark’ is a label we might place on a speech waveform to mark the position of a glottal pulse. Assuming the pitch marks are accurate, then the duration between two consecutive pitch marks is T0, by definition.
If the glottal pulses are sufficiently far apart in time (a large T0), then the impulse response of the vocal tract will decay away to zero before the next glottal pulse. In this case, each pitch period is equal to one impulse response of the vocal tract. This is almost the case in the (synthetic) speech waveform in the video Impulse Response where the waveform has decayed almost to zero before the next period starts.
So, a simple way to understand voiced speech is as a sequence of impulse responses of the vocal tract. This is a useful and helpful simplification for developing our understanding of speech signals. The video Source-filter model also makes this simplifying assumption (and all the speech signals used as examples are synthetic, to make things clearer).
However, in natural speech, the waveform generally does not decay all the way to zero before the next glottal pulse. Therefore, the impulse responses overlap (and we can assume they are simply summed, using our simplified model of the vocal tract).
-
October 31, 2023 at 17:01 #17064
Moving on the Module 6 and the video Pitch period, we are now looking at how to extract the vocal tract’s impulse response from a natural speech waveform.
If the impulse response actually did decay all the way to zero before the next glottal pulse, this would be easy for the reason stated above: one pitch period of the speech waveform would be exactly the impulse response we want.
Unfortunately, in natural speech, things are not that simple: the impulse responses overlap. So all we can do is deal in terms of pitch periods. We extract overlapping frames from the waveform so that we can reconstruct the waveform later using overlap-and-add. Since the analysis frames overlap, they will contain more than one pitch period. A good choice is an analysis frame capturing two pitch periods.
-
-
AuthorPosts
- You must be logged in to reply to this topic.