First, remind yourself about the architecture of a text-to-speech system: a front-end linguistic processor, followed by a waveform generator.
The first method we’ll use for waveform generation is Time-Domain Pitch Synchronous Overlap and Add (TD-PSOLA), so let’s see first how that works in practice.
If we wish to modify the spectral envelope, then we need a more powerful technique than TD-PSOLA, and so we’ll then look at linear prediction, which can manipulate source and filter separately.
Download the slides for the module 3, 4 and 5 videos
Download the additional slides for the class on 2019-10-24 : Module 5
Total video to watch in this module: 63 minutes
In this video, I occasionally refer to a second screen displaying a waveform; that screen was not recorded. I think most of the content of this video still makes sense.
The notation used in the linear prediction equation is slightly different from that used in class. It’s only a change of notation: the equation is otherwise exactly the same.
This concludes the speech synthesis material for this course. You should have been doing all of the essential readings as you went along. Now is a good time to go back and complete as many of the recommended readings as you can, and perhaps some of the extra readings on topics that are of particular interest to you.