Recognition and Evaluation

By comparing the recogniser's output with the hand-labelled test data, we can compute the Word Error Rate (WER).

Now we are ready to run the recogniser on some test data. You should run this on some existing isolated test digits (one digit per wav file) and the MFCC files from them. We run the Viterbi decoder using HVite. The script recognise_test_data does this. HVite makes use of your trained HMMs (one HMM per digit from the training step) and your language model. In the first instance, your language model is a very simple grammar that will ensure the recogniser will output just one digit (i.e., word) per recording. Later you can extend this grammar to recognise arbitrary length digit sequences.

The output from recognise_test_data is stored in the rec directory. Look at the recogniser output and compare it to the correct answers. To calculate the Word Error Rate (WER), we use the results script.

Again, you’ll need to edit the scripts to use a specific user (e.g. simonk) and the full data directory, rather than `whoami` and the data_upload directory.