Embedded training means that the data are transcribed, but that we don’t know the time alignment at the model or state levels.
Jurafsky & Martin – Section 9.6 – Search and Decoding
Important material on efficiently computing the combined likelihood of the acoustic model multiplied by the probability of the language model.
Jurafsky & Martin – Section 9.5 – The lexicon and language model
Simply mentions the lexicon and language model and refers the reader to other chapters.
Jurafsky & Martin – Section 9.4 – Acoustic Likelihood Computation
To perform speech recognition with HMMs involves calculating the likelihood that each model emitted the observed speech. You can skip 9.4.1 Vector Quantization.
Jurafsky & Martin – Section 9.3 – Feature Extraction: MFCCs
Mel-frequency Cepstral Co-efficients are a widely-used feature with HMM acoustic models. They are a classic example of feature engineering: manipulating the extracted features to suit the properties and limitations of the statistical model.
Jurafsky & Martin – Section 9.2 – The HMM Applied to Speech
Introduces some notation and the basic concepts of HMMs.
Jurafsky & Martin – Section 9.1 – Speech Recognition Architecture
Most modern methods of ASR can be described as a combination of two models: the acoustic model, and the language model. They are combined simply by multiplying probabilities.
Jurafsky & Martin – Section 8.5 – Unit Selection (Waveform) Synthesis
A brief explanation. Worth reading before tackling the more substantial chapter in Taylor (Speech Synthesis course only).
Jurafsky & Martin – Section 8.4 – Diphone Waveform Synthesis
A simple way to generate a waveform is by concatenating speech units from a pre-recorded database. The database contains one recording of each required speech unit.
Jurafsky & Martin – Section 4.2 – Simple (Unsmoothed) N-Grams
We can just use raw counts to estimate probabilities directly.
Jurafsky & Martin – Section 4.1 – Word Counting in Corpora
The frequency of occurrence of each N-gram in a training corpus is used to estimate its probability.
Jurafsky & Martin – Chapter 9 introduction
The difficulty of ASR depends on factors including vocabulary size, within- and across-speaker variability (including speaking style), and channel and environmental noise.