Important material on efficiently computing the combined likelihood of the acoustic model multiplied by the probability of the language model.
Jurafsky & Martin – Section 9.5 – The lexicon and language model
Simply mentions the lexicon and language model and refers the reader to other chapters.
Jurafsky & Martin – Section 9.4 – Acoustic Likelihood Computation
To perform speech recognition with HMMs involves calculating the likelihood that each model emitted the observed speech. You can skip 9.4.1 Vector Quantization.
Jurafsky & Martin – Section 9.3 – Feature Extraction: MFCCs
Mel-frequency Cepstral Co-efficients are a widely-used feature with HMM acoustic models. They are a classic example of feature engineering: manipulating the extracted features to suit the properties and limitations of the statistical model. Please note: the description of MFCC extraction steps differs somewhat from the standard definition of MFCCs and what is actually implemented in HTK. For the assignment, you should follow the description of MFCC extraction steps from the videos here on speech zone and in the lectures.
Jurafsky & Martin – Section 9.2 – The HMM Applied to Speech
Introduces some notation and the basic concepts of HMMs.
Jurafsky & Martin – Section 9.1 – Speech Recognition Architecture
Most modern methods of ASR can be described as a combination of two models: the acoustic model, and the language model. They are combined simply by multiplying probabilities.
Jurafsky & Martin – Section 8.5 – Unit Selection (Waveform) Synthesis
A brief explanation. Worth reading before tackling the more substantial chapter in Taylor (Speech Synthesis course only).
Jurafsky & Martin – Section 8.4 – Diphone Waveform Synthesis
A simple way to generate a waveform is by concatenating speech units from a pre-recorded database. The database contains one recording of each required speech unit.
Jurafsky & Martin – Section 4.4 – Perplexity
It is possible to evaluate how good an N-gram model is without integrating it into an automatic speech recognition. We simply measure how well it predicts some unseen test data.
Jurafsky & Martin – Section 4.3 – Training and Test Sets
As we should already know: in machine learning it is essential to evaluate a model on data that it was not learned from.
Jurafsky & Martin – Section 4.2 – Simple (Unsmoothed) N-Grams
We can just use raw counts to estimate probabilities directly.
Jurafsky & Martin – Section 4.1 – Word Counting in Corpora
The frequency of occurrence of each N-gram in a training corpus is used to estimate its probability.


This is the new version. Still under construction.