So far, we have just built a single voice. It would be good to have something to compare that against.
The methodology to use in this part is to create multiple versions of your voice, and then to compare them informally or in a listening test.
The way to have multiple voices is to make a complete copy of your ss
folder for each variant (you should move the original recordings
elsewhere first (to save space). You should also use symbolic links so that all the variants share the same wav
, mfcc
, and lpc
folders. Not only does this save even more space, it is also good engineering practice to avoid unnecessary copies of data.
The first thing to try is to revisit each of the stages in building the voice, and see whether there is anything you can improve. For example, you could adjust the pitchmarking parameters to more closely fit your voice. Then, try some or all of the following variations:
Find and fix a labelling error
To see, in principle, how we could improve the labels for the whole voice, we will just identify and then fix a single label alignment error.
Vary the contents of the database
Make some simple variations on your voice, by excluding parts of the database.
Introduce deliberate errors
By deliberately varying some aspects of the system, you can discover how much effect they have on the overall quality of the voice.
Join sub-cost weighting
Vary the relative weightings of the join sub-cost component (F0, power, spectrum).
Pruning
Festival's Multisyn unit selection engine prunes the candidate lists, and performs more pruning during the search.