Overview video
The handout mentioned in the video has been replaced by these web pages. The videos are made on a Mac, but you will be working on Linux.
In this video:
- Read all the instructions (on these webpages) before you start
- The folder structure that is provided as a starting point
- The ‘recipe’ for creating an ASR system
- collect training and testing data (optional)
- label the data (optional)
- parameterise the waveform files as MFCCs (optional)
- initialise and train one HMM for each digit class
- make a simple grammar
- test your system
In this assignment you will work with the HTK hidden Markov model toolkit. This is a powerful and general purpose toolkit from Cambridge University for speech recognition research, but we won’t worry about many of the more advanced features on this exercise.
How to access HTK and the assignment materials
In the PPLS AT lab or via remote desktop
If you are working in the PPLS AT lab, you can use the version of HTK that is already installed. All you need to do is copy the provided scripts to your own working directory, so that you can edit and extend them yourself (See ‘Getting the scripts’ below)
Connecting via ssh
If you’re finding the remote desktop is slow and you’re comfortable using the terminal, you can technically do everything via ssh
as you don’t have to listen to anything through HTK or use the GUI. Using the terminal (or powershell/PuTTy/Ubuntu terminal on Windows), you can connect using:
ssh your_uun@[AT lab machine name].ppls.ed.ac.uk
where you substitute [AT lab machine name] for one of the remotely accessible machines here.
You can open multiple terminals and ssh instances at once. This can be helpful if you are editing code in one terminal and running it in another.
Installing HTK on your own computer
If you want to try installing HTK on your own computer, there are some tips on this page with instructions for downloading the data.. You can also look on the assignment forums, but our ability to support self installs is limited – success has been variable, especially with newer laptops. If you have a good internet connection, there isn’t that much to gain from installing HTK yourself versus working on the PPLS AT lab servers remotely (e.g. through an ssh connection).
NOTE: From now on, the instructions generally assume you are using the PPLS AT lab computers in-person or using the remote desktop.
Getting the scripts
Everyone will need to get a copy of the assignment scripts. If you using the PPLS AT lab computers (remote or in-person), you can get the assignment scripts by running the following commands in the terminal (assuming you’ve already made the folder ~/Documents/sp)
cd ~/Documents/sp ## Don't forget the dot "." at the end of the next line! cp -R /Volumes/Network/courses/sp/digit_recogniser . cd digit_recogniser
You’ve now copied the scripts you need for the assignment. The data that these scripts use is on the servers in /Volumes/Network/courses/sp/data
If you are working on the PPLS AT lab computers, you don’t have to copy over the data (e.g. feature and label files), so there’s nothing else you need. If you install HTK yourself you will need to download the data (see the tips below).
Now, you can go on to building your first speaker dependent ASR system!