Time-align the labels

The database needs time-aligned labels. Consistency between these labels and the predictions that the front-end will make at runtime is important, so we will use the same front-end to create the initial label sequence, then used forced-alignment to put timestamps on those labels.

The initial phonetic sequence for forced alignment comes from Festival, by running the script through the front end. Remember to change unilex-rpx everywhere, if you are using a different dictionary.

Creating the initial labels

bash$ festival $MBDIR/scm/build_unitsel.scm ./my_lexicon.scm
festival>(make_initial_phone_labs "utts.data" "utts.mlf" 'unilex-rpx)

The output file utts.mlf is created, which is an HTK master label file (MLF) containing the phonetic transcription of all the utterances; the labels are not yet time-aligned with the waveforms.

Tip: if you want to design your own script later, the above command is the easiest way to convert text into a phone sequence, so that you can measure the coverage.

Forced alignment involves training HMMs, just as in automatic speech recognition. Therefore, the speech has to be parameterised. The features we will use are MFCCs.

Extracting MFCCs

bash$ make_mfccs alignment wav/*.wav

Doing the alignment

bash$ cd alignment
bash$ make_mfcc_list ../mfcc ../utts.data train.scp
bash$ do_alignment .

(Notice the space and the period after the last command!)

The do_alignment command will take a while to run (20 minutes or more) depending on the speed of the machine you are using and the amount of speech you recorded. Monitor it for the first 5 minutes or so to make sure there are no early problems.

Once the alignment has completed, you need to split the resulting MLF – which will now contain the correct time alignments for the labels – into individual label files that Festival can use.

Splitting the MLF file

bash$ cd ..
bash$ mkdir lab
bash$ break_mlf alignment/aligned.3.mlf lab

You can examine the label files at this point, but be careful not to change anything.

Optional variations

Skip this part during your first voice build, and come back later, when you are ready to create variations on the basic voice.

Modify the do_alignment script
Optionally, you can modify the do_alignment script, which will affect the quality of the forced alignment.