Before you continue, make sure you have completed the following:
- you have finished recording at least the ARCTIC ‘A’ set;
- you have a single
utts.data
file with a single line entry for each utterance; - you have checked your recorded data, and have a wav folder containing an individual
.wav
file for every utterance inutts.data
; - you have checked to ensure the file naming and numbering is correct.
If you haven’t recorded the full ARCTIC script, then edit utts.data (obviously you should make a backup copy first) so that it only includes prompts for which you have a corresponding wav file.
The next stage is to create time-aligned phonetic labels for the speech, using forced alignment and the HTK speech recognition toolkit. First you must set up a directory structure for HTK:
bash$ setup_alignment
This creates a directory called alignment
containing various HTK-related files. The script will also tell you that you need to make a couple of files: you will do that in the next step.
Choose the dictionary and phone set
Various dictionaries are available, depending on your accent. The choice of dictionary also determines which phone set you will use. You might need to add some words to the dictionary, to cover all the words in your additional material.
Time-align the labels
The database needs time-aligned labels. Consistency between these labels and the predictions that the front-end will make at runtime is important, so we will use the same front-end to create the initial label sequence, then used forced-alignment to put timestamps on those labels.
Related forums
-
- Forum
- Topics
- Last Post
-
-
Festival
Practical questions about the Festival speech synthesis toolkit.
- 68
- 4 months ago
-
HTK
Practical questions about the Hidden Markov Model toolkit. Read the manual before posting!
- 14
- 8 months, 3 weeks ago
-
Festival