Computing P(W) with a language model

This is the "prior" probability of W. It doesn't involve the observation sequence O, so can be computed without looking at the speech to be recognised.

Continuous speech (a first look)
Once we understand token passing within a single HMM, the extension to continuous speech is surprisingly easy.

Reading

Jurafsky & Martin – Section 4.1 – Word Counting in Corpora

The frequency of occurrence of each N-gram in a training corpus is used to estimate its probability.

Jurafsky & Martin – Section 4.2 – Simple (Unsmoothed) N-Grams

We can just use raw counts to estimate probabilities directly.

Jurafsky & Martin – Section 4.3 – Training and Test Sets

As we should already know: in machine learning it is essential to evaluate a model on data that it was not learned from.

Jurafsky & Martin – Section 4.4 – Perplexity

It is possible to evaluate how good an N-gram model is without integrating it into an automatic speech recognition. We simply measure how well it predicts some unseen test data.

April 15, 2024	This video was Excellent Difficulty Pretty simple Interactive toy demo
April 9, 2024	This video was Excellent Difficulty Just right Why? When? Which aspects?
March 25, 2024	This video was Excellent Difficulty Just right Script design
March 25, 2024	This video was Excellent Difficulty Just right Key concepts
March 25, 2024	This video was Excellent Difficulty Just right Annotating the database

Computing P(W) with a language model

Continuous speech (a first look)

Reading

Jurafsky & Martin – Section 4.1 – Word Counting in Corpora

Jurafsky & Martin – Section 4.2 – Simple (Unsmoothed) N-Grams

Jurafsky & Martin – Section 4.3 – Training and Test Sets

Jurafsky & Martin – Section 4.4 – Perplexity

Search the forums

Speech Processing

In the forums…

Latest video ratings