Convert label files to numerical values

State-level full context labels are converted to frame-level numerical features.

Create a question set

The question set depends on the phone set, because some of the questions are about phonetic context.

Edit scripts/make_question_set.sh to choose a name for your question set. Run the script. Inspect the file it creates to check that you see both quinphone questions (using the correct phone set) and suprasegmental questions. There should be 300-400 questions in total.

Edit your config file to use the question set you have just made, for example

[Labels]
question_file_name: /Users/simonk/Documents/ss_dnn/data/resources/questions_dnn_unilex-rpx_quinphone.hed

but change the full path to point to your own questions file, in your data/resources directory.

Look at some of the questions, and see how they match patterns in the full-context label files that you made earlier. If a question matches a label, that will become a ‘1’ in the numerical representation, otherwise it will be a ‘0’.

Create the numerical labels

Now, turn the NORMLAB step on

NORMLAB  : True
MAKECMP  : False
NORMCMP  : False
TRAINDNN : False
DNNGEN   : False
GENWAV   : False
CALMCD   : False

and run it:

$ python /Volumes/Network/courses/ss/dnn/dnn_tts/run_dnn.py feed_forward_dnn_WORLD.conf 

This will load each label file in turn, and convert it to frame-level numerical features (using the questions file). After processing all files, it will perform global normalisation (to the range [0,1]). The final features are saved in data/nn_no_silence_lab_norm_340 (the number on the end indicates the dimensionality of the features, which is determined by the number of the questions in the question file – this may vary depending on your phone set, etc).

Note that several (currently 9) positional features, such as state index, are appended to the categorical features derived from the questions.

Most silence frames are automatically removed at this stage, so that the distribution of frames is more balanced. This has been found to improve the training of the DNN.