Train the acoustic models

We will used supervised machine learning (including the Baum-Welch algorithm) to train models on labelled data.

This year (2023-24), the preceding data collection steps are optional and should be skipped initially. Start here, at this step, to build your first speaker-dependent digit recognizer.

The training algorithms for HMMs require a model to start with. In HTK this is called the prototype model. Its actual parameter values will be learned during training, but the topology must be specified by you, the engineer. This is a classic situation in machine learning, where designing or choosing the type of model is made using expert knowledge (or perhaps intuition).

Select your prototype HMMs

In this video:

We need a model to start with – in HTK this is called a “prototype model”
A prototype model defines the:
- dimensionality of the observation vectors
- type of observation vector (e.g., MFCCs)
- form of the covariance matrix (e.g., diagonal)
- number of states
- parameters (mean and variance) of the Gaussian pdf in each state
- topology of the model, using a transition matrix in which zero probabilities indicate transitions that are not allowed and never will be, even after training

You can experiment with different topologies – although for isolated digits, the only sensible thing to vary is the number of states. Models with varying numbers of states are provided in models/proto (remember, a 5 state model in HTK actually has only 3 emitting states). In your later experiments, modify the initialise_models script to try different prototype models. You might even want to create additional prototype models.

Now train the models

In this video:

Training consists of two stages:
- the model parameters are initialised using HInit which performs a simple alignment of observations and states (uniform segmentation), followed by Viterbi training
- then HRest performs Baum-Welch re-estimation
A close look at the initialise_models script

In the simple scripts that you have been given as as starting point, each of the two stages of training is performed using a separate script. You can run them now:

./scripts/initialise_models
./scripts/train_models

Note: If you haven’t recorded your own data, running these scripts is going to throw an error. Can you see why from the error message? You can fix this by editing the scripts to use data from the user simonk rather than calling the command `whoami`. We’ll go over this in the lab, but there’s an example of this in Atli Sigurgeirsson’s extremely helpful tutor notes for this part of the assignment.

In your later experiments, you will want to automate things as much as possible. You could combine these two steps into a single script, or call them in sequence from a master script.