This year (2024-25), the preceding data collection steps are optional and should be skipped initially. Start here, at this step, to build your first speaker-dependent digit recognizer.
The training algorithms for HMMs require a model to start with. In HTK this is called the prototype model. Its actual parameter values will be learned during training, but the topology must be specified by you, the engineer. This is a classic situation in machine learning, where designing or choosing the type of model is made using expert knowledge (or perhaps intuition).
Select your prototype HMMs
In this video:
- We need a model to start with – in HTK this is called a “prototype model”
- A prototype model defines the:
- dimensionality of the observation vectors
- type of observation vector (e.g., MFCCs)
- form of the covariance matrix (e.g., diagonal)
- number of states
- parameters (mean and variance) of the Gaussian probability density function (pdf) in each state
- topology of the model, using a transition matrix in which zero probabilities indicate transitions that are not allowed and never will be, even after training
You can experiment with different topologies – although for isolated digits, the only sensible thing to vary is the number of states. Models with varying numbers of states are provided in models/proto
(remember, a 5 state model in HTK actually has only 3 emitting states). In your later experiments, modify the initialise_models
script to try different prototype models. You might even want to create additional prototype models.
Now train the models
In this video:
- Training consists of two stages that we’ll look at in more detail in modules 9 and 10:
- the model parameters are initialised using
HInit
which performs a simple alignment of observations and states (uniform segmentation), followed by Viterbi training - then
HRest
performs Baum-Welch re-estimation
- the model parameters are initialised using
- A close look at the
initialise_models
script
In the simple scripts that you have been given as as starting point, each of the two stages of training is performed using a separate script. You can run them now:
./scripts/initialise_models ./scripts/train_models
Note: If you haven’t recorded your own data, running these scripts is going to throw an error. Can you see why from the error message? You can fix this by editing the scripts to use data from the user simonk
rather than calling the command `whoami`. We’ll go over this in the lab, but there’s an example of this in Atli Sigurgeirsson’s extremely helpful tutor notes for this part of the assignment.
In your later experiments, you will want to automate things as much as possible. You could combine these two steps into a single script, or call them in sequence from a master script.