Jurafsky & Martin – Section 9.6 – Search and Decoding

Important material on efficiently computing the combined likelihood of the acoustic model multiplied by the probability of the language model.

Jurafsky & Martin – Section 9.5 – The lexicon and language model

Simply mentions the lexicon and language model and refers the reader to other chapters.

Jurafsky & Martin – Section 9.4 – Acoustic Likelihood Computation

To perform speech recognition with HMMs involves calculating the likelihood that each model emitted the observed speech. You can skip 9.4.1 Vector Quantization.

Jurafsky & Martin – Section 9.3 – Feature Extraction: MFCCs

Mel-frequency Cepstral Co-efficients are a widely-used feature with HMM acoustic models. They are a classic example of feature engineering: manipulating the extracted features to suit the properties and limitations of the statistical model. Please note: the description of MFCC extraction steps differs somewhat from the standard definition of MFCCs and what is actually implemented in HTK. For the assignment, you should follow the description of MFCC extraction steps from the videos here on speech zone and in the lectures.

Jurafsky & Martin – Section 9.2 – The HMM Applied to Speech

Introduces some notation and the basic concepts of HMMs.

Jurafsky & Martin – Section 9.1 – Speech Recognition Architecture

Most modern methods of ASR can be described as a combination of two models: the acoustic model, and the language model. They are combined simply by multiplying probabilities.

Jurafsky & Martin – Section 8.5 – Unit Selection (Waveform) Synthesis

A brief explanation. Worth reading before tackling the more substantial chapter in Taylor (Speech Synthesis course only).

Jurafsky & Martin – Section 8.4 – Diphone Waveform Synthesis

A simple way to generate a waveform is by concatenating speech units from a pre-recorded database. The database contains one recording of each required speech unit.

Jurafsky & Martin – Section 4.4 – Perplexity

It is possible to evaluate how good an N-gram model is without integrating it into an automatic speech recognition. We simply measure how well it predicts some unseen test data.

Jurafsky & Martin – Section 4.3 – Training and Test Sets

As we should already know: in machine learning it is essential to evaluate a model on data that it was not learned from.

Jurafsky & Martin – Section 4.2 – Simple (Unsmoothed) N-Grams

We can just use raw counts to estimate probabilities directly.

Jurafsky & Martin – Section 4.1 – Word Counting in Corpora

The frequency of occurrence of each N-gram in a training corpus is used to estimate its probability.