Build your own DNN voice (SPCC version)

Instructions for building your own DNN-based voice, that can be completed within a 2 hour lab session.

You should perform this exercise in the working directory ~/Documents/ss_dnn which has already been set up for you.

Follow the Build your own DNN voice instructions, but with these small changes:

Tools required – skip
Prepare your workspace – skip, except that you need to:
1. set the correct value for work in feed_forward_dnn_WORLD.conf
Prepare the input labels – do this
Prepare the output features – do this

copy_synthesis.sh has been run for you, to save time
you need to perform the feature composition and normalisation

Design the DNN – skip
Train the DNN – do this

start with only 50 training sentences, and perhaps 5 epochs; the resulting network can be used for synthesis (what do you expect the synthetic speech will sound like?)
once that works, try 100 training sentences and 10 epochs; listen to the results
finally you can increase the amount of training data (maximum of 500 sentences available in the data set we are using), and the number of epochs (recommended maximum number is 30)
approximate training times:
- 50 training sentences, 5 epochs = 6 minutes
- 100 training sentences, 10 epochs = 20 minutes
- 100 training sentences, trained to convergence (27 epochs) = 1 hour
- 500 training sentences, trained to convergence (26 epochs) = 4 hours

Synthesise – do this

If you run out of time, there is a pre-built version in ~/Documents/ss_dnn_prebuilt in which the network has been trained to convergence on 500 sentences, then used to synthesise the dev and test sets.

Build your own DNN voice (SPCC version)

Search this site

Posts

Latest Activity