An overview of the background and maths behind linear-prediction methods for modelling the vocal tract as a filter.
Jurafsky & Martin (3rd Ed) – Hidden Markov models
An overview of Hidden Markov Models, the Viterbi algorithm, and the Baum-Welch algorithm
Plag (2003) – Word formation in English: Chapter 1 Basic Concepts
An introductory text of word structure/morphology in English. Useful to read if you come from a non-linguistic background.
Johnson (Phonetics) – Chapter 6.1 – Tube models of vowel production
Deriving the resonances and formant structures of vowels using 2 and 3 tube models of the vocal tract.
Johnson (Phonetics) – Chapter 2 – The Acoustic Theory of Speech Production: Deriving Schwa
Derives the acoustic features of the vocal tract in terms of the source-filter model
Cho & Ladefoged – Variation and universals in VOT: evidence from 18 languages
Voice onset time (VOT) is known to vary with place of articulation.
Practical Phonetics
Videos for the course Practical Phonetics
Taylor – Section 12.7 – Pitch and epoch detection
Only an outline of the main approaches, with little technical detail. Useful as a summary of why these tasks are harder than you might think.
Furui et al: Fundamental Technologies in Modern Speech Recognition
A complete issue of IEEE Signal Processing Magazine. Although a few years old, this is still a very useful survey of current techniques.
Holmes & Holmes – Chapter 9 – Stochastic Modelling
May be helpful as a complement to the essential readings.
Holmes & Holmes – Chapter 11 – Improving Speech Recognition Performance
We mitigate the over-simplifications of the model using ever-more-complex algorithms.
Jurafsky & Martin – Section 4.4 – Perplexity
It is possible to evaluate how good an N-gram model is without integrating it into an automatic speech recognition. We simply measure how well it predicts some unseen test data.