Module 5 – speech synthesis – waveform generation

Manipulating recorded speech signals to create new utterances.

Start Videos Finish

First, remind yourself about the architecture of a text-to-speech system: a front-end linguistic processor, followed by a waveform generator.

CUI 2024 video available

Pipeline architecture for TTS

The first method we’ll use for waveform generation is Time-Domain Pitch Synchronous Overlap and Add (TD-PSOLA), so let’s see first how that works in practice.

CUI 2024 video available

TD-PSOLA …the hard way

If we wish to modify the spectral envelope, then we need a more powerful technique than TD-PSOLA, and so we’ll then look at linear prediction, which can manipulate source and filter separately.

Download the slides for the module 3, 4 and 5 videos

Download the additional slides for the class on 2019-10-24 : Module 5

Total video to watch in this module: 63 minutes

This concludes the speech synthesis material for this course. You should have been doing all of the essential readings as you went along. Now is a good time to go back and complete as many of the recommended readings as you can, and perhaps some of the extra readings on topics that are of particular interest to you.

CUI 2024 video available

Interactive unit selection