Start

First, remind yourself about the architecture of a text-to-speech system: a front-end linguistic processor, followed by a waveform generator.

CUI 2024 video available

Pipeline architecture for TTS

The first method we’ll use for waveform generation is Time-Domain Pitch Synchronous Overlap and Add (TD-PSOLA), so let’s see first how that works in practice.

CUI 2024 video available

TD-PSOLA …the hard way

If we wish to modify the spectral envelope, then we need a more powerful technique than TD-PSOLA, and so we’ll then look at linear prediction, which can manipulate source and filter separately.

Start

Download the slides for the module 3, 4 and 5 videos

Download the additional slides for the class on 2019-10-24 : Module 5