Furui et al: Fundamental Technologies in Modern Speech Recognition

A complete issue of IEEE Signal Processing Magazine. Although a few years old, this is still a very useful survey of current techniques.

Holmes & Holmes – Chapter 11 – Improving Speech Recognition Performance

We mitigate the over-simplifications of the model using ever-more-complex algorithms.

Jurafsky & Martin – Section 4.1 – Word Counting in Corpora

The frequency of occurrence of each N-gram in a training corpus is used to estimate its probability.

Jurafsky & Martin – Section 4.2 – Simple (Unsmoothed) N-Grams

We can just use raw counts to estimate probabilities directly.

Jurafsky & Martin – Section 4.3 – Training and Test Sets

As we should already know: in machine learning it is essential to evaluate a model on data that it was not learned from.

Jurafsky & Martin – Section 4.4 – Perplexity

It is possible to evaluate how good an N-gram model is without integrating it into an automatic speech recognition. We simply measure how well it predicts some unseen test data.

Jurafsky & Martin – Section 9.7 – Embedded training

Embedded training means that the data are transcribed, but that we don’t know the time alignment at the model or state levels.

Jurafsky & Martin – Section 9.8 – Evaluation

In connected speech, three types of error are possible: substitutions, insertions, or deletions of words. It is usual to combine them into a single measure: Word Error Rate.