Dynamic Time Warping

This rather old-fashioned method is a great way to understand Dynamic Programming, a very widely-applicable technique.

Start with this very simple post, which reduces the problem to one of simply measuring distance in a feature space

then work through the videos and reading.

  • Whole word templates

    Our first automatic speech recogniser stores an example ("template") of each word. Speech to be recognised is compared against each template.

  • Linear time warping

    The simplest way to deal with variable duration is to stretch the unknown word to have the same duration as the template.

Reading

Holmes & Holmes – Chapter 8 – Template matching and dynamic time warping

Read up to the end of 8.5 carefully. Try to read 8.6 as part of Module 7, but rest assured we will go over the concept of dynamic programming again in Module 9. We recommend you should skim 8.7 and 8.8 because the same general concepts carry forward into Hidden Markov Models (again, we'll come back to this in Module 9). You don't need to read 8.9 onwards. Methods like DTW are rarely used now in state of the art systems, but are a good way to start understanding some core ideas.

Jurafsky & Martin – Chapter 9 introduction

The difficulty of ASR depends on factors including vocabulary size, within- and across-speaker variability (including speaking style), and channel and environmental noise.

Jurafsky & Martin – Section 9.1 – Speech Recognition Architecture

Most modern methods of ASR can be described as a combination of two models: the acoustic model, and the language model. They are combined simply by multiplying probabilities.

Jurafsky & Martin – Section 9.3 – Feature Extraction: MFCCs

Mel-frequency Cepstral Co-efficients are a widely-used feature with HMM acoustic models. They are a classic example of feature engineering: manipulating the extracted features to suit the properties and limitations of the statistical model.

The following exercise involves programming in Python, and so is beyond the scope of this course. If you can program, then you may find this quite helpful in developing your understanding of dynamic programming.