Module 6 – window width in signal processing

This topic has 4 replies, 3 voices, and was last updated 2 years, 6 months ago by Simon.

Viewing 4 reply threads

Author

Posts
- October 31, 2022 at 10:01 #16180
  Rebecka N
  Student
  Simon mentions in one of the videos that the width of the analysis window is twice the fundamental period when performing diphone concatenation. Is this because of the sampling theorem (/if our window is 2T0 we ensure that the full impulse response for that wave fits in the window, both the max and min)?
  Or is it because each impulse response represents one phone and by making the window encompass two epochs we are ensuring that a diphone fits in the frame?
  Additionally, I’m wondering whether we are altering the window width in a subsequent step: is the width 2T0 for concatenation and then, after joining the diphones, we extract frames which are one per pitch period in order to manipulate these for the purpose of prosody/changing the duration and pitch of the output?
- November 3, 2022 at 13:21 #16226
  Simon
  Professor
  Can you point at the exact video and timestamp for context, so I can give a precise answer?
- November 5, 2022 at 23:08 #16310
  Rebecca O
  Student
  I’m curious about this too. Not sure which video Rebecka meant, but it is mentioned in the TD-PSOLA video at 5.50 onwards.
- November 6, 2022 at 08:56 #16312
  Simon
  Professor
  In TD-PSOLA, the width of the analysis window is typically twice the fundamental period so that each window contains two pitch periods. This is so that, if we need to space the pitch periods further apart (i.e., to reduce F0), there is some waveform to ‘fill the gap’.
  
  The nicely-plotted waveform shown in the video at 5:50 is for one of the diphones in the sequence. At 5:45 my hand-drawn waveform for one diphone was unfortunately only two periods long – that was sloppy of me and potentially confusable with a TD-PSOLA analysis frame, which it is not. A diphone would generally be longer than that.
  
  Each impulse response does not represent one phone. A phone is generally much longer than T0 : the vocal tract shape changes much more slowly than the vocal folds vibrate.
- November 6, 2022 at 09:00 #16313
  Simon
  Professor
  Diphone concatenation and TD-PSOLA could be implemented as a single process during waveform generation. We can simultaneously modify the F0 and duration inside each individual diphone and overlap-add the last pitch period of each diphone with the first pitch period of the next diphone to concatenate them.
Author

Posts

Viewing 4 reply threads

You must be logged in to reply to this topic.

Module 6 – window width in signal processing

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis