Finding mistakes

Festival makes mistakes, of course. Your task is to find interesting ones, and explain why each occurs.

Starting Festival

Just a reminder that every time you start Festival during this exercise, make sure you use the right voice config file so that you’re finding errors in the right voice (awb).  Remember to change to the directory where you placed the config.scm file:

$ festival config.scm

Saving waveforms from Festival

You’ll want to save at least some of your synthesised utterances for further analysis, e.g. in Praat.  Since you have a fully synthesised utterance object in Festival, it is possible to extract the waveform to a file as follows:

festival>(utt.save.wave myutt "myutt.wav" 'riff)

myutt should be the name of the utterance object, myutt.wav is the filename, which you can choose; if you save more than one waveform, then give them different names. You can now view and analyse the waveform in Praat or Wavesurfer.

Explaining mistakes made by Festival

Using what you have learned about Festival, you can now find some examples of it making errors for English text-to-speech. Find examples in each of the following categories:

  • Text normalisation
  • POS tagging/homographs
  • Phrase break prediction
  • Pronunciation (dictionary or letter-to-sound)
  • Waveform generation

At this point, you might be thinking that the mistakes are simply because Festival is rather old, and that more recent TTS systems will not make such mistakes. So here are two examples from October 2024:

  1. Google voice search reading the text “To permanently disable Live Photos on an iPhone, you can do the following: Go to Settings; Toggle the switch next to Live Photos.” (this is a WaveNet model running on Google’s servers)
  2. Apple iOS reading the text “Your next appt at Christopher Sale Dentistry Ltd is on 28/04/25…” (this is presumed to be running on-device, which is an old iPhone 8 in this case)

 

In your explorations with Festival, aim for a variety of different types of errors, with different underlying causes: 1 error for each of the front-end categories, (see the assignment overview). Don’t report lots of errors of the same type.

Be sure that you understand the differences between these various types of error. For example, when Festival says a word incorrectly, it might not be a problem with the pronunciation components (dictionary + letter-to-sound) – it could be a problem earlier or later in the pipeline. You need to play detective and be precise about the underlying cause of every error you report.

You might discover errors in other categories too. That’s fine: you may wish to incorporate some discussion of this in the last section of your report (discussing the TTS pipeline as a whole) .

Use the SayText command to synthesise text that you think will produce errors given what you’ve already found out about the awb voice and the TTS pipeline it uses.   If you’re stuck you might try generating sentences from an external text for inspiration (e.g., from a news website, or a novel).

For each utterance you analyse, store the results in a variable. You will need to examine the contents of this utterance structure to decide what type each error is.

You will often be able to identify the source of the error from the relations generated by SayText directly (i.e. without running the pipeline step-by-step).  However, in some cases (but not all – it will be too slow), you may also need to run Festival step-by-step (as in the previous part of the exercise).  The crucial thing for the write-up is that you provide evidence of where and why the error occured.

Skills to develop in this assignment

  • use SayText to synthesise lots of different sentences
  • precisely pinpoint audible errors in the output (e.g., which word, syllable or phone)
  • understand that errors made by later steps in the pipeline might be caused by erroneous input; in other words, the mistake actually happened earlier in the pipeline
  • understand that mistakes can happen in both the front end and the waveform generator
  • make a hypothesis about the cause of the mistake
  • trace back through the pipeline to find the earliest step at which something went wrong
  • obtain evidence to test your hypothesis, by inspecting the Utterance and/or the waveform and/or the spectrogram