Build your own DNN voice (SPCC version)

Instructions for building your own DNN-based voice, that can be completed within a 2 hour lab session.

You should perform this exercise in the working directory ~/Documents/ss_dnn which has already been set up for you.

Follow the Build your own DNN voice instructions, but with these small changes:

  • Tools required – skip
  • Prepare your workspace – skip, except that you need to:
    1. set the correct value for work in feed_forward_dnn_WORLD.conf
  • Prepare the input labels – do this
  • Prepare the output features – do this
    • copy_synthesis.sh has been run for you, to save time
    • you need to perform the feature composition and normalisation
  • Design the DNN – skip
  • Train the DNN – do this
    • start with only 50 training sentences, and perhaps 5 epochs; the resulting network can be used for synthesis (what do you expect the synthetic speech will sound like?)
    • once that works, try 100 training sentences and 10 epochs; listen to the results
    • finally you can increase the amount of training data (maximum of 500 sentences available in the data set we are using), and the number of epochs (recommended maximum number is 30)
    • approximate training times:
      • 50 training sentences, 5 epochs = 6 minutes
      • 100 training sentences, 10 epochs = 20 minutes
      • 100 training sentences, trained to convergence (27 epochs) = 1 hour
      • 500 training sentences, trained to convergence (26 epochs) = 4 hours
  • Synthesise – do this

If you run out of time, there is a pre-built version in ~/Documents/ss_dnn_prebuilt in which the network has been trained to convergence on 500 sentences, then used to synthesise the dev and test sets.