This is basically just archiving some material from when this course ran in a hybrid setup. We decided to drop some components when classes went back to “normal” in-person.
This course is taught at the University of Edinburgh at advanced undergraduate and Masters levels.
Module 0 - getting started
Start here! Gives an introduction to the course, explains how the course is delivered, and describes the computing environment you will need.
Module 1 - Phonetics and Visual Representations of Speech
An introduction to phonetics and how we can visualise speech
Module 3 - Digital Speech Signals
What are spectrograms really? An introduction to Digital Signal Processing and the Discrete Fourier Transform
Module 4 - the Source-Filter Model
Building on our understanding of the digital signal processing, we look at source-filter model from more of an engineering perspective
Module 5 - speech synthesis - phonemes and the front end
Pronunciation, including letter-to-sound models, and predicting prosody. All these tasks can be done with Classification And Regression Trees (CARTs).
Module 6 - Speech Synthesis - waveform generation and connected speech
Manipulating recorded speech signals to create new utterances.
Module 7 - Speech Recognition - Pattern matching
The most basic way to recognise speech is by comparing the speech to be recognised with stored reference examples.
Module 8 - Speech Recognition - Feature engineering
To get the best out of machine learning, we can prepare features that reflect our knowledge of the problem, and suit our chosen model.
Module 9 - Speech Recognition - the Hidden Markov Model
We now replace pattern matching with a generative model that is learned from data.
Module 10 - Speech Recognition - Connected speech & HMM training
HMMs extend easily to connected speech so finally we put everything together to make a complete speech recognition system. We'll also learn how to train an HMM from data.
The Festival text-to-speech system (pre-2023/24)
Festival is a widely used research toolkit for Text-To-Speech. It is not perfect, and your goal is to discover various types of errors it makes, then understand why they occur.