Parameterise the data (optional)

Our HMMs do not work directly with waveforms, but rather with features extracted from those waveforms. Performing this step yourself is OPTIONAL, but you still need to understand the process.

This year (2023-24) this data collection and feature extraction steps are OPTIONAL. Do not execute any of the commands in this section unless you have recorded your own data. However, you still need to understand the feature extraction process, so you can describe it in your report.

Each waveform file must be parameterised as MFCCs. This is done using the HTK command HCopy. The file called CONFIG_for_coding file, which is provided, specifies the settings used for the MFCC analysis. You should keep these settings for this assignment.


Do not run HCopy by hand – instead, you should use the make_mfccs script. The script runs HCopy like this:

HCopy -T 1 -C resources/CONFIG_for_coding wav/file.wav mfcc/file.mfcc

Run the make_mfccs script as soon as you have finished preparing and checking the data – it will copy your data and labels into a shared directory, making it available to the rest of the class.

If you make any changes to your data (e.g., correcting a label) then you must run the script again.

Make sure you are in the digit_recogniser subdirectory, then run:

./scripts/make_mfccs

Scroll back up through the output from this command to see if there were any errors. If there were, correct the problems (e.g., move files to the correct places) and run the command again.

Sharing your data

The make_mfccs script copies your data to a shared folder for use by other students. If you discover and fix any errors in your data, you should re-run this script.

Related forum