Start

First, remind yourself about the architecture of a text-to-speech system: a front-end linguistic processor, followed by a waveform generator.

The first method we’ll use for waveform generation is Time-Domain Pitch Synchronous Overlap and Add (TD-PSOLA), so let’s see first how that works in practice.

If we wish to modify the spectral envelope, then we need a more powerful technique than TD-PSOLA, and so we’ll then look at linear prediction, which can manipulate source and filter separately.

Download the slides for the module 3, 4 and 5 videos

Download the additional slides for the class on 2019-10-24 : Module 5