Module 9 – speech recognition – connected speech & HMM training

HMMs extend easily to connected speech so finally we put everything together to make a complete speech recognition system. We'll also learn how to train an HMM from data.
Log in

Reading

Jurafsky & Martin – Section 4.1 – Word Counting in Corpora

The frequency of occurrence of each N-gram in a training corpus is used to estimate its probability.

Jurafsky & Martin – Section 4.2 – Simple (Unsmoothed) N-Grams

We can just use raw counts to estimate probabilities directly.

Jurafsky & Martin – Section 9.5 – The lexicon and language model

Simply mentions the lexicon and language model and refers the reader to other chapters.

Jurafsky & Martin – Section 9.6 – Search and Decoding

Important material on efficiently computing the combined likelihood of the acoustic model multiplied by the probability of the language model.

Jurafsky & Martin – Section 9.7 – Embedded training

Embedded training means that the data are transcribed, but that we don't know the time alignment at the model or state levels.

Jurafsky & Martin – Section 9.8 – Evaluation

In connected speech, three types of error are possible: substitutions, insertions, or deletions of words. It is usual to combine them into a single measure: Word Error Rate.

Jurafsky & Martin – Section 4.4 – Perplexity

It is possible to evaluate how good an N-gram model is without integrating it into an automatic speech recognition. We simply measure how well it predicts some unseen test data.

This is a SKILLS tutorial about writing up the second assignment. Details to be added – will be announced on Learn.

Here is a link to a recording of the class.

Find the recording in the General channel on Teams, or via this link.