Instructions for building your own DNN-based voice, that can be completed within a 2 hour lab session.
You should perform this exercise in the working directory ~/Documents/ss_dnn
which has already been set up for you.
Follow the Build your own DNN voice instructions, but with these small changes:
- Tools required – skip
- Prepare your workspace – skip, except that you need to:
- set the correct value for
work
infeed_forward_dnn_WORLD.conf
- set the correct value for
- Prepare the input labels – do this
- Prepare the output features – do this
copy_synthesis.sh
has been run for you, to save time- you need to perform the feature composition and normalisation
- Design the DNN – skip
- Train the DNN – do this
- start with only 50 training sentences, and perhaps 5 epochs; the resulting network can be used for synthesis (what do you expect the synthetic speech will sound like?)
- once that works, try 100 training sentences and 10 epochs; listen to the results
- finally you can increase the amount of training data (maximum of 500 sentences available in the data set we are using), and the number of epochs (recommended maximum number is 30)
- approximate training times:
- 50 training sentences, 5 epochs = 6 minutes
- 100 training sentences, 10 epochs = 20 minutes
- 100 training sentences, trained to convergence (27 epochs) = 1 hour
- 500 training sentences, trained to convergence (26 epochs) = 4 hours
- Synthesise – do this
If you run out of time, there is a pre-built version in ~/Documents/ss_dnn_prebuilt
in which the network has been trained to convergence on 500 sentences, then used to synthesise the dev and test sets.