Because DNN synthesis evolved from HMM synthesis, it is common to use HTS format label files to represent context-dependent phonemes.
After dumpfeats we need to manipulate the format to make HTS format label files (which are an extended version of HTK labels). The scripts/utts_to_mlfs.sh script is provided for running dumpfeats and then performing all the subsequent steps. Take a few moments to understand what it is doing, then run it.
The script makes both monophone and full-context label files. We’ll use the monophone labels for doing forced alignment, and after that we will transfer the timestamps over to the full-context labels.
Can you explain why we cannot simply use the full-context labels for forced alignment?
This is the new version. Still under construction.