Getting started

A first look at Festival and how we use it in interactive mode on the command line.

Accessing Festival

The instructions assume you are using the installation of Festival on the computers in the PPLS Appleton Tower (AT) labs. You can work on the computers in the actual physical lab or you can use the the remote desktop service (see Module 0 for instructions). Note: The PPLS AT Lab remote desktop we’re using for this course is a separate thing from the Informatics remote desktop service!

The other option is to install Festival directly onto your computer. This will require a unix like environment (Linux or MacOS, or a Linux style terminal runnning on Windows) and require you to compile some code (some install instructions are linked here).

If you’ve never compiled code before, and don’t have much experience with the unix command line, your best bet is to use the PPLS AT Lab computers.

Accessing Festival Remotely

You can use the installation of Festival on the Appleton Tower lab servers using the remote desktop service.

To connect using the remote desktop, follow the instructions here: Module 0 – computing requirements

Once you’ve started the remote desktop and logged in (with your UUN and EASE password), you can open the Terminal app by going to the menu bar at the top and clicking Applications > System Tools > Terminal. You may want to drag the Terminal icon to the desktop to make it easier for you to find it.

If you accidentally close VNC before you log out, you can reconnect by double clicking on the machine you previously logged onto in VNC viewer.

When you are finished, remember to log out: in the menu at the top of the screen > System > Log out

Transferring data from the AT lab servers

To get your data (e.g. generated wav files) from the AT lab servers (e.g. remote desktop) you can either use a terminal based command like rsync or an SFTP client like FileZilla (graphical interface). For example, to following copies the file myutt.wav that’s in ~/Documents/sp/assignment1 to the directory I’m running the rsync command from on my own computer (don’t run this yet!):

rsync -avz s1234567@scp1.ppls.ed.ac.uk:Documents/sp/assignment1/myutt.wav ./

Note: The previous command will only work if you’ve already made the directories Documents/sp/assignment1 in your home directory on the AT lab servers. If you haven’t done this, you can skip this for now and try it after you’ve created some files.

You can share your files with yourself by copying them to OneDrive (or Google Drive) via a browser.

You can also use a file transfer app like FileZilla. In this case, you need to set the remote host to scp1.ppls.ed.ac.uk. For FileZilla, go to File > Site Manager, then set the protocol to SFTP, the host as scp1.ppls.ed.ac.uk, and use your UUN as username and EASE password as the password. After connecting you should see your home directory on the AT lab servers as the remote site. You can then drag files from remote site side to the appropriate place in the local site side.

Installing Festival Yourself

It should be fine (and probably easier) to just use the PPLS AT lab machines in-person or using the remote desktop for this assignment.

You can also try installing festival yourself on your own computer (Linux or MacOS only). This is only recommended if you already have some experience compiling programs from source code (or you’re willing to spend some time learning about this!). Otherwise, we suggest you use the remote desktop option (or come to the actual lab).

You can get the source code from the Festvox website or the festvox github page.

Macbook: Here’s a gist outlining the steps Catherine used to install Festival on her Macbook (run the commands in the terminal): Link to gist. Also see these forum posts:

https://speech.zone/forums/topic/festival-error-setting-input-audio-stream-format
- i.e., you may need to install sox to play audio and add some lines to the config.scm file mentioned in in Step-by-Step
https://speech.zone/forums/topic/installing-festival-on-a-newer-mac-e-g-missing-makefile

This gist also gives direct download links for the relevant zipped code archives. Please note, that you are required to use a specific voice for this assignment (cstr_edi_awb_arctic_multisyn). The data required for these voices is not publicly available, so you will need to download this from the PPLS AT lab servers (see Assignment Data below and gist link).

To install Festival from you will generally need to be able to compile C++ code. If you’ve never done this before, this may be rather daunting and you may need to install extra tools (e.g. Xcode command line tools for MacOS). If you use Ubuntu, you can also install it using the apt package manager (e.g. apt-get install festival or possibly just apt install festival).

Ubuntu Linux: See instructions in this gist for installing Festival on Ubuntu. Some issues have been reported installing Festival from source on Ubuntu (as in the gist above). You can find some notes and an alternative way to do this using the Ubuntu package manager here.

Windows: If you have a windows machine and but want to install Festival on your computer, you might like to try the Ubuntu on Windows. See also Ubuntu on WSL. See the Ubuntu instructions below.

You can also have a look in the forums for other tips on how to do this!

Just another reminder though that you don’t have to install Festival yourself! It’s fine just to use the lab machines!

Assignment Data

If you are using the remote desktop to access the AT lab computers (or are physically in the lab!), all the relevant data is already there for you on the linux machines.

If you aren’t using the remote desktop to access the AT lab computers, you will need to get the voice database and dictionaries used to run the voice (voice:cstr_edi_awb_arctic_multisyn, dictionary:unilex). You can find instructions by following this link.

Start Festival

Festival has a command line interface runs through the terminal (i.e. the unix bash shell). To do this in the PPLS AT lab, you’ll need to (i) make sure the computer is booted into Linux (if it is in windows restart the machine and select the penguin (the Linux mascot!) when presented with the choice); (ii) open a terminal via Applications > System Tools > Terminal from the menu bar at the top left of the screen. You can drag the Terminal icon from the menu to the desktop if you want to make a shortcut.

So, now, open a Terminal and run Festival by typing in festival at the prompt:

$ festival

Festival Speech Synthesis System 2.5.0:release December 2017
Copyright (C) University of Edinburgh, 1996-2010. All rights reserved.

..etc

and the prompt will change

festival>

This new prompt means that Festival is running; any commands that you type will now be interpreted by Festival rather than by the shell.

You will be pleased to know that Festival’s command-line interface is the same as the bash shell (e.g., TAB completion, ctrl-a, ctrl-e, ctrl-p, ctrl-n, up/down/left/right cursor keys, etc.). Here’s a nice cheat sheet for common bash commands. For a comprehensive list of these shortcuts, see the Wikipedia entry for GNU Readline.

If you get into trouble at any point and need to exit the current command, use ctrl-c. This applies to both Festival and the bash shell.

It’s really worth learning these keyboard shortcuts because they also apply to the bash shell and will save you a lot of time.

Make Festival speak

Synthesise some sentences to become familiar with to the Festival command line.

Festival contains a number of different synthesis engines and for each of these, several voices are available: the quality of synthesis is highly dependent on the particular engine and voice that is being used.

By default, Festival will start with a rather old diphone voice, which does not sound great, but is fast and good enough for now:

festival> (set! myutt (SayText "Welcome to Festival"))

Note: If you installed Festival onto your own computer and followed the instructions to download the assignment voice onto your own computer (i.e., you downloaded the directory cstr_edi_awb_arctic_multisyn but not the other voices), you’ll be missing the default (i.e., diphone synthesizer, kal) voice. You can set the voice to the one we will use in the assignment by typing the following after starting festival:

(voice_cstr_edi_awb_arctic_multisyn)

See the instructions in “Assignment Data” above for a command to download the other voices.

You won’t be able to hear anything if you access Festival using ssh! You’ll need to save your generated utterance as a wav file (see below) and copy your files from the AT lab servers to your own computer (e.g. using scp or rsync on the terminal or an SFTP app like FileZilla).

To generate an utterance without playing it (e.g. if you are using ssh to connect to the AT lab servers), use the following steps instead of SayText:
festival> (set! myutt (Utterance Text "Hello")) festival> (utt.synth myutt)

Then you can save the utterance myutt a wave file as “myutt.wav” with the following command:
festival> (utt.save.wave myutt "myutt.wav" 'riff)

When you issue a command to Festival you must put it in round brackets (...) – if you do not, it will generate an error. You are using a language called Scheme.

Scheme, and lots of brackets

Scheme is a LISP-like language used as the interface to Festival. When you run Festival in interactive mode, you talk to Festival in Scheme. Fortunately, we’re not going to have to learn too much Scheme. All you need to know for now is that the basic syntax is (function_name argument1 argument2 ...).

In Scheme, all functions return a value, which by default is printed after the function completes. The SayText function returns an Utterance structure, so that is why # is printed after the completion of the function. A variable can be set to capture this return value, which will allow us to examine the utterance after processing. This is done using the set! command (note the two sets of brackets):

festival> (set! myutt (SayText "Welcome to Festival"))
#

Save the utterance to a wav file

You can save the generated waveform to a file as follows:

festival>(utt.save.wave myutt "myutt.wav" 'riff)

Here myutt should be the name of the utterance object (set above in next to the SayText command), myutt.wav is the filename, which you can choose; if you save more than one waveform, then give them different names. You can now view and analyse the waveform in Praat or Wavesurfer.

The TTS process

We can now examine the contents of the myutt variable. The SayText function is a high level function which calls a number of other functions in a chain. Each of these functions performs a specific job, such as deciding the pronunciation of a word, or how long a phone should be. We’ll be running these step-by-step later on.

The TTS process in Festival is a pipeline of sub-processes, which build up an Utterance structure in stages. This building process takes the original text as input and adds more and more information, which is stored in the utterance structure. In Festival, a unified mechanism for representing all types of data needed by the system has been developed: this is called the Heterogeneous Relation Graph system, or HRG for short.

Each Relation in an HRG is a structure that links items of a particular linguistic type. For example, we have a Word relation which is a list linking all the words, and a Segment relation which links all the phones etc. Relations can take different forms: the most common types are linear lists and trees.

Each module in Festival takes a number of relations as input and either creates new relations as output, or modifies the input ones. The vast majority of modules only write new information, leaving all information in the input untouched (there are a few exceptions, such as post-lexical processing). Because of this, examining the contents of the relations in an utterance after processing gives an insight into the history of the TTS process.

Different configurations of Festival can use vary with respect to their use of HRGs, and which modules they call.

Examining a saved object

Once you have synthesised an utterance you can do lots of things with it. Here are a few examples.

festival> (utt.play myutt)
festival> (utt.relationnames myutt)
festival> (utt.relation.print myutt 'Word)
festival> (utt.relation.print myutt 'Segment)

You can get a list of the relations that are present in a synthesised utterance by using the utt.relationnames command.

Relations that are lists can easily be printed to the screen with the utt.relation.print command. Try this with all of the relations in an utterance. Some of them won’t reveal useful information, others will.

The output from (utt.relation.print myutt 'Word) may look like this:

()
id _3 ; name hello ; pos_index 16 ; pos_index_score 0 ; pos uh ;
        phr_pos uh ; phrase_score -13.43 ; pbreak_index 1 ;
        pbreak_index_score 0 ; pbreak NB ;
id _4 ; name world ; pos_index 8 ; pos_index_score 0 ; pos nn ;
        phr_pos n ; pbreak_index 0 ; pbreak_index_score 0 ;
        pbreak B ; blevel 3 ;
nil

Each data line starts with an id number like id _3 then a series of features follow separated by semicolons. Each feature has a name and a value, e.g., feature name: pos, feature value: uh.

Examining the processing steps

Tokens – First the text is split into Tokens. Look at the Token relation, where an item is created for each component of the text you input. The Token relation will still have digits and abbreviations in it.

Words – The Tokens are then converted to Words, abbreviations and digits are processed and expanded. Look for this in the Word relation.

Part of Speech Tagging – Each word is tagged with its part of speech, which is added as a feature to the Word relation.

Pronunciation – The pronunciation of each word is determined and the Syllable and Segment relations created. Examine these: the syllable relation is not very interesting as there is very little information here, just a count of the syllables.

You can look up the pronunciation of a word yourself with the function lex.lookup

festival> (lex.lookup "caterpillar")
("caterpillar" nil (((k ae t) 1) ((ax p) 0) ((ih l) 1) ((er) 0)))

The actual pronunciation returned depends on which lexicon a particular voice uses, and whether the word is in the lexicon or if Festival has to predict the pronunciation using letter-to-sound rules.

Try looking up the pronunciation of some real words, and some made up ones.

Accent Prediction – An intonation module assigns pitch accents (and other intonational events) to syllables. A number of different modules exist within Festival, operating with a number of intonation models including Tobi and Tilt. The assignment voice (‘awb’) doesn’t actually do accent prediction, but you can see what this would look like by trying the older diphone synthesis voice, kal, which does:

To load the kal voice, enter the following in festival:

festival>(voice_kal_diphone)

Now, look at the IntEvent relation to see which pitch events have been assigned. From the pitch events and the predicted durations, a pitch contour is generated. This contour is a list of numbers which specify the pitch at each point for the resulting waveform. There is no easy way to view the pitch contour.

To change back to the assignment voice:

festival>(voice_cstr_edi_awb_arctic_multisyn)

Waveform generation – The Unit relation is created by making a list of diphones from the segments and the information about the speech needed for synthesis is copied in. The Unit relation contains features with values in square brackets [...] These are references to the actual speech used to synthesise these units.

Quit Festival

festival> (quit)

or use ctrl-D, just like in the bash shell. Festival remembers your command history inbetween sessions (again, just like bash). Next time you start Festival you can use the up cursor key to find previous command, and then hit ‘Enter’ to execute them again. Of course, Festival does not remember the values of variables (e.g., myutt in the above example) between sessions.

What you should now be able to do

start Festival and make it speak using SayText
capture the Utterance structure returned by SayText
look inside the Utterance structure at the Relations
have an initial understanding of what Relations are, but not yet the full picture
use some of the keyboard shortcuts that are common to Festival and the bash shell