Taylor – Section 12.4 – Linear-Prediction Analysis

An overview of the background and maths behind linear-prediction methods for modelling the vocal tract as a filter.

Jurafsky & Martin (3rd Ed) – Hidden Markov models

An overview of Hidden Markov Models, the Viterbi algorithm, and the Baum-Welch algorithm

Plag (2003) – Word formation in English: Chapter 1 Basic Concepts

An introductory text of word structure/morphology in English. Useful to read if you come from a non-linguistic background.

Johnson (Phonetics) – Chapter 6.1 – Tube models of vowel production

Deriving the resonances and formant structures of vowels using 2 and 3 tube models of the vocal tract.

Johnson (Phonetics) – Chapter 2 – The Acoustic Theory of Speech Production: Deriving Schwa

Derives the acoustic features of the vocal tract in terms of the source-filter model

Cho & Ladefoged – Variation and universals in VOT: evidence from 18 languages

Voice onset time (VOT) is known to vary with place of articulation.

Practical Phonetics

Videos for the course Practical Phonetics

Taylor – Section 12.7 – Pitch and epoch detection

Only an outline of the main approaches, with little technical detail. Useful as a summary of why these tasks are harder than you might think.

Furui et al: Fundamental Technologies in Modern Speech Recognition

A complete issue of IEEE Signal Processing Magazine. Although a few years old, this is still a very useful survey of current techniques.

Holmes & Holmes – Chapter 9 – Stochastic Modelling

May be helpful as a complement to the essential readings.

Holmes & Holmes – Chapter 11 – Improving Speech Recognition Performance

We mitigate the over-simplifications of the model using ever-more-complex algorithms.

Jurafsky & Martin – Section 4.4 – Perplexity

It is possible to evaluate how good an N-gram model is without integrating it into an automatic speech recognition. We simply measure how well it predicts some unseen test data.