Young et al: Token Passing

My favourite way of understanding how the Viterbi algorithm is applied to HMMs. Can also be helpful in understanding search for unit selection speech synthesis.

Holmes & Holmes – Chapter 10 – Front-end analysis for ASR

Covers filterbank, MFCC features. The material on linear prediction is out of scope.

Sharon Goldwater: Basic probability theory

An essential primer on this topic. You should consider this reading ESSENTIAL if you haven’t studied probability before or it’s been a while. We’re adding this the readings in Module 7 to give you some time to look at it before we really need it in Module 9 – mostly we need the concepts of conditional probability and conditional independence.

Sharon Goldwater: Vectors and their uses

A nice, self-contained introduction to vectors and why they are a useful mathematical concept. You should consider this reading ESSENTIAL if you haven’t studied vectors before (or it’s been a while).

Taylor – Chapter 3 – The text-to-speech problem

Discusses the differences between spoken and written forms of language, and describes the structure of a typical TTS system.

Taylor – Chapter 8 – Pronunciation

Including how the lexicon is stored, letter-to-sound, and compressing the lexicon.

Holmes & Holmes – Chapter 5 – Message synthesis from stored human speech components

Pitch-synchronous overlap-and-add (PSOLA) remains a key technique in speech signal processing.

Handbook of phonetic sciences – Ch 20 – Intro to Signal Processing for Speech

Written for a non-technical audience, this gently introduces some key concepts in speech signal processing.

Ladefoged (Elements) – Chapter 8 – Resonances of the vocal tract

A more detailed look at how resonance occurs in the vocal tract, and how that can be used to explain the sound quality (e.g. formant frequencies) of vowel sounds.

Ladefoged (Elements) – Chapter 7 – The production of speech

Some phonetics at last, and a first encounter with the source-filter model of speech.

Ladefoged (Elements) – Chapter 6 – Hearing

Some understanding of human hearing will be helpful for engineering suitable features to extract from the waveform for automatic speech recognition.

Ladefoged (Elements) – Chapter 5 – Resonance

Now we turn to speech production, and resonance is how the vocal tract shapes the quality of sound, to produce different vowels, for example.