Start

Module status: ready

Welcome

Welcome to the course! The first lecture will provide a detailed overview of this course, including:

  • who the course is designed for
  • the textbook
  • scope and structure
  • a brief history of speech synthesis
  • teaching mode and how to get the most out of this course, including using this website

Lectures

Lectures (actually a varied mixture of lecture material and in-class activities) will be held on campus. Simon King will give the lectures this year, with Korin Richmond leading the lab sessions.

Practical assignment

The assignment for this course involves recording speech data and building a unit selection voice. Labs will be held on-campus in the PPLS Computing Lab (Appleton Tower 4.02) on Linux desktop computers with all necessary software already installed. Remote access is possible, but please note that attendance at the scheduled lab sessions is expected, and is vital for success on this course.

Assumed background from the Speech Processing course

The Speech Synthesis course assumes you have previously taken Speech Processing. If you have not, first talk to the lecturer to obtain permission, then revise the following material from Speech Processing (items in bold are the most important):

Module 1: Introduction to the International Phonetic Alphabet

Module 2: Waveform; Spectrum; Spectrogram

Module 3: Time Domain; Sound Source; Periodic Signal; Pitch; Digital Signal; Short-term analysis; Series Expansion; Fourier Analysis; Frequency domain

Module 4: Harmonics;  Impulse train; Spectral envelope; Filter; Impulse response; Source-filter model

Module 5: Tokenisation & normalisation; Handwritten rules; Phonemes and allophones; Pronunciation; Prosody

Module 6: DiphoneWaveform concatenationOverlap-add; Pitch period; TD-PSOLA

Module 7: Feature vectors, sequences, and sequences of feature vectors; Pattern Matching, Alignment, Dynamic Time Warping

Modules 8, 9, 10: ideally, try to get some understanding of what a Hidden Markov Model is, but don’t worry if you don’t fully understand this material. You do not need to understand how Automatic Speech Recognition works.