Starting Festival
Every time you start Festival during this exercise, do it like this, remembering to first change to the directory where you placed the config.scm
file:
$ festival config.scm
Saving waveforms from Festival
Once you have a fully synthesised utterance object in Festival, it is possible to extract the waveform to a file as follows:
festival>(utt.save.wave myutt "myutt.wav" 'riff)
myutt
should be the name of the utterance object, myutt.wav
is the filename, which you can choose; if you save more than one waveform, then give them different names. You can now view and analyse the waveform in Praat or Wavesurfer.
Explaining mistakes made by Festival
Using what you have learned about Festival, you can now find some examples of it making errors for English text-to-speech. Find examples in each of the following categories:
- text normalisation
- POS tagging/homographs
- phrase break prediction
- pronunciation (dictionary or letter-to-sound)
- waveform generation
Aim for a variety of different types of errors, with different underlying causes: 1 error for each of the front-end categories, 2 errors for waveform generation (see the marking scheme). Don’t report lots of errors of the same type. Be sure that you understand the differences between these various types of error. For example, when Festival says a word incorrectly, it might not be a problem with the pronunciation components (dictionary + letter-to-sound) – it could be a problem earlier or later in the pipeline. You need to play detective and be precise about the underlying cause of every error you report.
You might discover errors in other categories too. That’s fine: you can report and explain those as well.
Use the SayText
command to synthesise lots of text (e.g., from a news website, or constructed by you). Store the results in a variable. You will need to examine the contents of this utterance structure to decide what type each error is.
In some cases (but not all – it will be too slow), you may also need to run Festival step-by-step (as in the previous part of the exercise).
Skills to develop in this assignment
- use
SayText
to synthesise lots of different sentences - precisely pinpoint audible errors in the output (e.g., which word, syllable or phone)
- understand that errors made by later steps in the pipeline might be caused by erroneous input; in other words, the mistake actually happened earlier in the pipeline
- understand that mistakes can happen in both the front end and the waveform generator
- make a hypothesis about the cause of the mistake
- trace back through the pipeline to find the earliest step and which something went wrong
- obtain evidence to test your hypothesis, by inspecting the Utterance and/or the waveform and/or the spectrogram