HTK essentials

This is a widely-used toolkit in automatic speech recognition research.

We will be assuming you have version 3 of HTK, although everything should work with any recent version.

All HTK commands start H, for example, HList is a tool to list the contents of data files (waveforms, MFCCs, etc.). The tools use command line flags and arguments extensively, as well as configuration files. Command line flags in capital letters are common across tools, such as:

  • -T 1 specifies basic tracing
  • -C resources/CONFIG tells the tool to load the config file called CONFIG in the directory resources

Simple scripts are provided for building a basic speaker-dependent digit recogniser from your own data. You will need to modify them slightly to make the more advanced system later. You will need to refer to the HTK manual.

HTK is not open source, so the only way to obtain the manual for HTK is to register on the HTK website. Do that now. You don’t need to read the manual yet, but it will be useful later.

To modify the scripts you’ll need to use a text editor, such as Atom or emacs. Never use a word processor!

Everywhere in this handout that you see <username>, you should replace it with your username. For example lab/<username>_test.mlf would become lab/s1234567_test.mlf.

File formats can quickly get confusing. There are dozens of waveform formats, and various formats for model parameters, MFCCs and so on. We will record the speech data into Microsoft Wav (RIFF) format waveforms, which will have the extension .wav, for example.

Please stick to the recommended names of files and directories – it will make it easier for me to help you (Edinburgh students only).

Related forum