Interactive IPA chart
Taylor – Section 12.7 – Pitch and epoch detection
Only an outline of the main approaches, with little technical detail. Useful as a summary of why these tasks are harder than you might think.
Jurafsky & Martin – Section 8.5 – Unit Selection (Waveform) Synthesis
A brief explanation. Worth reading before tackling the more substantial chapter in Taylor (Speech Synthesis course only).
Furui et al: Fundamental Technologies in Modern Speech Recognition
A complete issue of IEEE Signal Processing Magazine. Although a few years old, this is still a very useful survey of current techniques.
Holmes & Holmes – Chapter 9 – Stochastic Modelling
May be helpful as a complement to the essential readings.
Holmes & Holmes – Chapter 11 – Improving Speech Recognition Performance
We mitigate the over-simplifications of the model using ever-more-complex algorithms.
Jurafsky & Martin – Section 4.4 – Perplexity
It is possible to evaluate how good an N-gram model is without integrating it into an automatic speech recognition. We simply measure how well it predicts some unseen test data.
Jurafsky & Martin – Section 4.3 – Training and Test Sets
As we should already know: in machine learning it is essential to evaluate a model on data that it was not learned from.
Jurafsky & Martin – Section 4.2 – Simple (Unsmoothed) N-Grams
We can just use raw counts to estimate probabilities directly.
Jurafsky & Martin – Section 4.1 – Word Counting in Corpora
The frequency of occurrence of each N-gram in a training corpus is used to estimate its probability.
Jurafsky & Martin – Chapter 4 – N-Grams
A simple and effective way to model language is as a sequence of words. We assume that the probability of each word depends only on the identity of the preceding N-1 words.
Jurafsky & Martin – Section 9.6 – Search and Decoding
Important material on efficiently computing the combined likelihood of the acoustic model multiplied by the probability of the language model.