This course is taught at the University of Edinburgh as the Speech Synthesis course, at advanced undergraduate and Masters levels. Students should normally have completed the Speech Processing course first, which includes material on the Text-to-Speech front end. In this Speech Synthesis course, the focus is mostly on waveform generation.
Weekly schedule
The calendar shows which module you need to complete before each week's lecture. It also lists lab times and specifies the coursework deadline.
Readings
You will find reading lists within each module. Here, you will find the same readings arranged into alphabetically-sorted lists, broken down by module or importance.
Module 1 - introduction
This module contains some introductory material and speech samples, to accompany the first lecture, which is an introduction to the course.
Module 2 - unit selection
Concatenating recordings of natural recorded speech waveforms can provide extremely natural synthetic speech. The core problem is how to select the most appropriate waveform fragments to concatenate.
Module 3 - unit selection target cost functions
The target cost is critical to choosing an appropriate unit sequence. Several different forms are possible, using linguistic features, or acoustic properties, or a combination of both.
Module 4 - the database
The quality of unit selection depends on good quality recorded speech, with accurate labels
Module 5 - evaluation
How do we know how good our synthesiser is? Can we use formal evaluation to decide how to improve it?
Module 6 - speech signal analysis & modelling
Epoch detection, F0 estimation and the spectral envelope. Representing them for modelling. We also consider aperiodic energy. Then, we can analyse and reconstruct speech: this is called vocoding.
Module 7 - Statistical Parametric Speech Synthesis
After establishing the key concepts and motivating this way of doing speech synthesis, we cover the Hidden Markov Model approach.
Module 8 - Deep Neural Networks
The use of neural networks is motivated by replacing the regression tree, which is used in the HMM approach, with a more powerful regression model.
Module 9 - sequence-to-sequence models
True sequence-to-sequence models improve over frame-by-frame models by encoding the entire input sequence then generating the entire output sequence
The state of the art
The content of this part of the course is updated each year. We will cover the latest developments.