We will be assuming you have version 3 of HTK, although everything should work with any recent version.
All HTK commands start H, for example, HList is a tool to list the contents of data files (waveforms, MFCCs, etc.). The tools use command line flags and arguments extensively, as well as configuration files. Command line flags in capital letters are common across tools, such as:
-T 1specifies basic tracing-C resources/CONFIGtells the tool to load the config file calledCONFIGin the directoryresources
Simple scripts are provided for building a basic speaker-dependent digit recogniser from your own data. You will need to modify them slightly to make the more advanced system later. You will need to refer to the HTK manual.
HTK is not open source, so if possible you should obtain the manual for HTK is to register on the HTK website. However, you can also browse the manual on the web: HTK book as webpage. If you are logged into speech.zone you can also find a copy in the assignment forums. You don’t need to read the manual yet, but it will be useful later.
To modify the scripts you’ll need to use a text editor, such as Atom or emacs. Never use a word processor!
Everywhere in this handout that you see <username>, you should replace it with your username. For example lab/<username>_test.mlf would become lab/s1234567_test.mlf.
File formats can quickly get confusing. There are dozens of waveform formats, and various formats for model parameters, MFCCs and so on. We will record the speech data into Microsoft Wav (RIFF) format waveforms, which will have the extension .wav, for example.
Please stick to the recommended names of files and directories – it will make it easier for me to help you (Edinburgh students only).
Related forum
-
- Topic
- Voices
- Last Post
- You must be logged in to create new topics.