essential – Page 2

Jurafsky & Martin – Section 9.8 – Evaluation

In connected speech, three types of error are possible: substitutions, insertions, or deletions of words. It is usual to combine them into a single measure: Word Error Rate.

Jurafsky & Martin – Section 9.7 – Embedded training

Embedded training means that the data are transcribed, but that we don’t know the time alignment at the model or state levels.

Jurafsky & Martin – Section 9.5 – The lexicon and language model

Simply mentions the lexicon and language model and refers the reader to other chapters.

Jurafsky & Martin – Section 9.4 – Acoustic Likelihood Computation

To perform speech recognition with HMMs involves calculating the likelihood that each model emitted the observed speech. You can skip 9.4.1 Vector Quantization.

Jurafsky & Martin – Section 9.3 – Feature Extraction: MFCCs

Mel-frequency Cepstral Co-efficients are a widely-used feature with HMM acoustic models. They are a classic example of feature engineering: manipulating the extracted features to suit the properties and limitations of the statistical model. Please note: the description of MFCC extraction steps differs somewhat from the standard definition of MFCCs and what is actually implemented in HTK. For the assignment, you should follow the description of MFCC extraction steps from the videos here on speech zone and in the lectures.

Jurafsky & Martin – Section 9.2 – The HMM Applied to Speech

Introduces some notation and the basic concepts of HMMs.

Jurafsky & Martin – Section 9.1 – Speech Recognition Architecture

Most modern methods of ASR can be described as a combination of two models: the acoustic model, and the language model. They are combined simply by multiplying probabilities.

Jurafsky & Martin – Chapter 9 introduction

The difficulty of ASR depends on factors including vocabulary size, within- and across-speaker variability (including speaking style), and channel and environmental noise.

Holmes & Holmes – Chapter 8 – Template matching and dynamic time warping

Read up to the end of 8.5 carefully. Try to read 8.6 as part of Module 7, but rest assured we will go over the concept of dynamic programming again in Module 9. We recommend you should skim 8.7 and 8.8 because the same general concepts carry forward into Hidden Markov Models (again, we’ll come back to this in Module 9). You don’t need to read 8.9 onwards. Methods like DTW are rarely used now in state of the art systems, but are a good way to start understanding some core ideas.

Jurafsky & Martin – Section 9.8 – Evaluation

Jurafsky & Martin – Section 9.7 – Embedded training

Jurafsky & Martin – Section 9.5 – The lexicon and language model

Jurafsky & Martin – Section 9.4 – Acoustic Likelihood Computation

Jurafsky & Martin – Section 9.3 – Feature Extraction: MFCCs

Jurafsky & Martin – Section 9.2 – The HMM Applied to Speech

Jurafsky & Martin – Section 9.1 – Speech Recognition Architecture

Jurafsky & Martin – Chapter 9 introduction

Holmes & Holmes – Chapter 8 – Template matching and dynamic time warping

Jurafsky & Martin – Section 8.4 – Diphone Waveform Synthesis

Jurafsky & Martin (2nd ed) – Section 8.3 – Prosodic Analysis

Jurafsky & Martin (2nd ed) – Section 8.2 – Phonetic Analysis

Search this site

Posts

Latest Activity

Search the forums