We will be assuming you have version 3 of HTK, although everything should work with any recent version.
All HTK commands start H
, for example, HList
is a tool to list the contents of data files (waveforms, MFCCs, etc.). The tools use command line flags and arguments extensively, as well as configuration files. Command line flags in capital letters are common across tools, such as:
-T 1
specifies basic tracing-C resources/CONFIG
tells the tool to load the config file calledCONFIG
in the directoryresources
Simple scripts are provided for building a basic speaker-dependent digit recogniser from your own data. You will need to modify them slightly to make the more advanced system later. You will need to refer to the HTK manual.
HTK is not open source, so the only way to obtain the manual for HTK is to register on the HTK website. Do that now. You don’t need to read the manual yet, but it will be useful later.
To modify the scripts you’ll need to use a text editor, such as Atom or emacs. Never use a word processor!
Everywhere in this handout that you see <username>
, you should replace it with your username. For example lab/<username>_test.mlf
would become lab/s1234567_test.mlf
.
File formats can quickly get confusing. There are dozens of waveform formats, and various formats for model parameters, MFCCs and so on. We will record the speech data into Microsoft Wav (RIFF) format waveforms, which will have the extension .wav, for example.
Please stick to the recommended names of files and directories – it will make it easier for me to help you (Edinburgh students only).
Related forum
-
- Topic
- Voices
- Last Post
- You must be logged in to create new topics.