› Forums › Speech Processing – Live Q&A Sessions › Module 6 – window width in signal processing
- This topic has 4 replies, 3 voices, and was last updated 2 years, 1 month ago by Simon.
-
AuthorPosts
-
-
October 31, 2022 at 10:01 #16180
Simon mentions in one of the videos that the width of the analysis window is twice the fundamental period when performing diphone concatenation. Is this because of the sampling theorem (/if our window is 2T0 we ensure that the full impulse response for that wave fits in the window, both the max and min)?
Or is it because each impulse response represents one phone and by making the window encompass two epochs we are ensuring that a diphone fits in the frame?
Additionally, I’m wondering whether we are altering the window width in a subsequent step: is the width 2T0 for concatenation and then, after joining the diphones, we extract frames which are one per pitch period in order to manipulate these for the purpose of prosody/changing the duration and pitch of the output? -
November 3, 2022 at 13:21 #16226
Can you point at the exact video and timestamp for context, so I can give a precise answer?
-
November 5, 2022 at 23:08 #16310
I’m curious about this too. Not sure which video Rebecka meant, but it is mentioned in the TD-PSOLA video at 5.50 onwards.
-
November 6, 2022 at 08:56 #16312
In TD-PSOLA, the width of the analysis window is typically twice the fundamental period so that each window contains two pitch periods. This is so that, if we need to space the pitch periods further apart (i.e., to reduce F0), there is some waveform to ‘fill the gap’.
The nicely-plotted waveform shown in the video at 5:50 is for one of the diphones in the sequence. At 5:45 my hand-drawn waveform for one diphone was unfortunately only two periods long – that was sloppy of me and potentially confusable with a TD-PSOLA analysis frame, which it is not. A diphone would generally be longer than that.
Each impulse response does not represent one phone. A phone is generally much longer than T0 : the vocal tract shape changes much more slowly than the vocal folds vibrate.
-
November 6, 2022 at 09:00 #16313
Diphone concatenation and TD-PSOLA could be implemented as a single process during waveform generation. We can simultaneously modify the F0 and duration inside each individual diphone and overlap-add the last pitch period of each diphone with the first pitch period of the next diphone to concatenate them.
-
-
AuthorPosts
- You must be logged in to reply to this topic.