Fun with TD-PSOLA

For diphone synthesis, systems like Festival need to manipulate the fundamental frequency and duration of recorded speech. TDPSOLA is a popular way to do that.

In diphone speech synthesis, the fundamental frequency and duration of the diphones are manipulated to match the values predicted from text. Praat allows us to experiment with one common technique that is used to manipulate fundamental frequency and duration for speech synthesis, called TD-PSOLA.

Start Praat and load in any natural speech waveform. Select the object from the object list and click on Manipulate- then To Manipulation..., and click OK on the pop-up window that appears.

Select the new object and then click Edit. You should get a window showing the waveform and the pitch contour.

What are the vertical blue lines overlaid on the waveform? Zoom in to find out.

In the new window select Pitch / Stylise pitch (2 st). The pitch contour should become a few points joined by lines.

praat_manip

Play the waveform, and then drag the points around and play the waveform again. Repeat until bored…

Try to make the sentence sound like a question. Try to place the emphasis on different words.

If you really want to get clever, add a number of duration points to the duration tier by selecting Dur and then Add duration point at cursor; move the cursor and repeat a few times. Move a few of these points around and play the file.

When you make extreme changes to pitch and duration, can you hear any signal processing artefacts?