Prepare your workspace

Create the directory structures you need, and some configuration files.

Working directory

Don’t work in the same directory as a unit selection voice – that would be confusing.

Download this zip file and unzip it in your Documents directory. It will create a directory called ss_dnn containing the required working directory structure, scripts and other resources.

If you installed all the tools needed for this exercise yourself, you now need to edit the two paths in setup.sh. The variable SSROOTDIR should point at the root directory of your installed tools, and FROOTDIR should point at the Festival installation within there. If you installed all the required Python packages system-wide, you can remove the line that sets the PYTHONPATH.

Data

Waveforms

Inside the data directory, create a symbolic link (called wav) that points to the directory that contains your complete set of waveforms (use endpointed versions if you found that necessary to get good alignments previously) at 48kHz sample rate. Do not use the 16kHz versions – all the signal processing in this exercise is set up for 48kHz sample rate.

If you don’t have any waveforms, then use the ones in /Volumes/Network/courses/ss/corpora/nick_dnn_benchmark_WORLD/wav – make your symbolic link to there instead.

File list

Now make a file in your data directory called file_id_list.scp that lists all the basenames (no path, no extension) of your waveform files. You should also make a small version of this list for testing (perhaps the first 20 files, in a file called file_id_list_20.scp).

There are lots of ways to do this, but I suggest you practise your skills on the command line – the shell scripting forum might help.

arctic_a0001
arctic_a0002
arctic_a0003
arctic_a0004
arctic_a0005
arctic_a0006
arctic_a0007
arctic_a0008
...etc

Utterance structures

In your top-level working directory (not in data) make a symbolic link to the utt directory of your unit selection voice. However, if that unit selection voice contains utterance files that are not needed here, then you should actually create a directory called utt and copy the required utterance files from the unit selection voice into it.

Config file

This is the key file because it defines all the paths, variables and processing stages. In your local copy of feed_forward_dnn_WORLD.conf, make the following changes:

1. in the [Paths] section, set the value of work at the start of the file to the absolute path of your working directory, and change file_id_list to the small file list you made above (e.g., file_id_list_20.scp);

If you installed all the tools needed for this exercise yourself, you will also need to set the paths to the SPTK tools and the WORLD vocoder executables.

2. in the [Processes] section at the end, set the following values:

NORMLAB  : False
MAKECMP  : False
NORMCMP  : False
TRAINDNN : False
DNNGEN   : False
GENWAV   : False
CALMCD   : False

Environment variables

The following environment variables need to be set and exported every time you open a new shell:

THEANO_FLAGS="floatX=float32"
export THEANO_FLAGS
PYTHONPATH=:/Volumes/Network/courses/ss/dnn/lib/python2.7/site-packages/
export PYTHONPATH

and setup.sh is provided to do that – you need to source it in every new shell (just like for the unit selection build).

If you get import errors later from run_dnn.py, such as “ImportError: No module named theano”, then you’ve forgotten to source this file.