The Festival text-to-speech system (pre-2023/24)

Festival is a widely used research toolkit for Text-To-Speech. It is not perfect, and your goal is to discover various types of errors it makes, then understand why they occur.

This assignment has updated  for Speech Processing 2023-24.  If you are taking the course in 2023-24 (or later) don’t follow these instructions! 

The milestones will help you stay on track with this assignment. Try to stay ahead of the milestones.

Log in
  1. Getting started
    A first look at Festival and how we use it in interactive mode on the command line.

    Accessing Festival

    The instructions assume you are using the installation of Festival on the computers in the PPLS Appleton Tower (AT) labs. You can work on the computers in the actual physical lab or you can use the the remote desktop service (see Module 0 for instructions). Note: The PPLS AT Lab remote desktop we’re using for this course is a separate thing from the Informatics remote desktop service!  

    The other option is to install Festival directly onto your computer. This will require a unix like environment (Linux or MacOS, or a Linux style terminal runnning on Windows) and require you to compile some code (some install instructions are linked here).

    If you’ve never compiled code before, and don’t have much experience with the unix command line, your best bet is to use the PPLS AT Lab computers.

    Accessing Festival Remotely

    You can use the installation of Festival on the Appleton Tower lab servers using the remote desktop service.

    To connect using the remote desktop, follow the instructions here: Module 0 – computing requirements

    Once you’ve started the remote desktop and logged in (with your UUN and EASE password), you can open the Terminal app by going to the menu bar at the top and clicking Applications > System Tools > Terminal. You may want to drag the Terminal icon to the desktop to make it easier for you to find it.

    If you accidentally close VNC before you log out, you can reconnect by double clicking on the machine you previously logged onto in VNC viewer.

    When you are finished, remember to log out: in the menu at the top of the screen > System > Log out

    Transferring data from the AT lab servers

    To get your data (e.g. generated wav files) from the AT lab servers (e.g. remote desktop) you can either use a terminal based command like rsync or an SFTP client like FileZilla (graphical interface). For example, to following copies the file myutt.wav that’s in ~/Documents/sp/assignment1 to the directory I’m running the rsync command from on my own computer (don’t run this yet!):

    rsync -avz s1234567@scp1.ppls.ed.ac.uk:Documents/sp/assignment1/myutt.wav ./
    

    Note: The previous command will only work if you’ve already made the directories Documents/sp/assignment1 in your home directory on the AT lab servers. If you haven’t done this, you can skip this for now and try it after you’ve created some files.

    You can share your files with yourself by copying them to OneDrive (or Google Drive) via a browser.

    You can also use a file transfer app like FileZilla. In this case, you need to set the remote host to scp1.ppls.ed.ac.uk. For FileZilla, go to File > Site Manager, then set the protocol to SFTP, the host as scp1.ppls.ed.ac.uk, and use your UUN as username and EASE password as the password. After connecting you should see your home directory on the AT lab servers as the remote site. You can then drag files from remote site side to the appropriate place in the local site side.

    Installing Festival Yourself

    It should be fine (and probably easier) to just use the PPLS AT lab machines in-person or using the remote desktop for this assignment.

    You can also try installing festival yourself on your own computer (Linux or MacOS only). This is only recommended if you already have some experience compiling programs from source code (or you’re willing to spend some time learning about this!). Otherwise, we suggest you use the remote desktop option (or come to the actual lab).

    You can get the source code from the Festvox website or the festvox github page.

    Macbook: Here’s a gist outlining the steps Catherine used to install Festival on her Macbook (run the commands in the terminal): Link to gist. Also see these forum posts:

    This gist also gives direct download links for the relevant zipped code archives. Please note, that you are required to use a specific voice for this assignment (cstr_edi_awb_arctic_multisyn). The data required for these voices is not publicly available, so you will need to download this from the PPLS AT lab servers (see Assignment Data below and gist link).

    To install Festival from you will generally need to be able to compile C++ code. If you’ve never done this before, this may be rather daunting and you may need to install extra tools (e.g. Xcode command line tools for MacOS). If you use Ubuntu, you can also install it using the apt package manager (e.g. apt-get install festival or possibly just apt install festival).

    Ubuntu Linux: See instructions in this gist for installing Festival on Ubuntu.  Some issues have been reported installing Festival from source on Ubuntu (as in the gist above). You can find some notes and an alternative way to do this using the Ubuntu package manager here.

    Windows: If you have a windows machine and but want to install Festival on your computer, you might like to try the Ubuntu on Windows. See also Ubuntu on WSL. See the Ubuntu instructions below.

    You can also have a look in the forums for other tips on how to do this!

    Just another reminder though that you don’t have to install Festival yourself! It’s fine just to use the lab machines!

    Assignment Data

    If you are using the remote desktop to access the AT lab computers (or are physically in the lab!), all the relevant data is already there for you on the linux machines.

    If you aren’t using the remote desktop to access the AT lab computers, you will need to get the voice database and dictionaries used to run the voice (voice:cstr_edi_awb_arctic_multisyn, dictionary:unilex). You can find instructions by following this link.

    Start Festival

    Festival has a command line interface runs through the terminal (i.e. the unix bash shell).  To do this in the PPLS AT lab, you’ll need to (i) make sure the computer is booted into Linux (if it is in windows restart the machine and select the penguin (the Linux mascot!) when presented with the choice); (ii) open a terminal via Applications > System Tools > Terminal from the menu bar at the top left of the screen.   You can drag the Terminal icon from the menu to the desktop if you want to make a shortcut.

    So, now, open a Terminal and run Festival by typing in festival at the prompt:

    $ festival
    
    Festival Speech Synthesis System 2.5.0:release December 2017
    Copyright (C) University of Edinburgh, 1996-2010. All rights reserved.
    
    ..etc
    

    and the prompt will change

    festival>
    

    This new prompt means that Festival is running; any commands that you type will now be interpreted by Festival rather than by the shell.

    You will be pleased to know that Festival’s command-line interface is the same as the bash shell (e.g., TAB completion, ctrl-a, ctrl-e, ctrl-p, ctrl-n, up/down/left/right cursor keys, etc.). Here’s a nice cheat sheet for common bash commands.  For a comprehensive list of these shortcuts, see the Wikipedia entry for GNU Readline.

    If you get into trouble at any point and need to exit the current command, use ctrl-c. This applies to both Festival and the bash shell.

    It’s really worth learning these keyboard shortcuts because they also apply to the bash shell and will save you a lot of time.

    Make Festival speak

    Synthesise some sentences to become familiar with to the Festival command line.

    Festival contains a number of different synthesis engines and for each of these, several voices are available: the quality of synthesis is highly dependent on the particular engine and voice that is being used.

    By default, Festival will start with a rather old diphone voice, which does not sound great, but is fast and good enough for now:

    festival> (set! myutt (SayText "Welcome to Festival"))
    

    Note: If you installed Festival onto your own computer and followed the instructions to download the assignment voice onto your own computer (i.e., you downloaded the directory cstr_edi_awb_arctic_multisyn but not the other voices), you’ll be missing the default (i.e., diphone synthesizer, kal) voice. You can set the voice to the one we will use in the assignment by typing the following after starting festival:

    (voice_cstr_edi_awb_arctic_multisyn)

    See the instructions in “Assignment Data” above for a command to download the other voices.

    You won’t be able to hear anything if you access Festival using ssh! You’ll need to save your generated utterance as a wav file (see below) and copy your files from the AT lab servers to your own computer (e.g. using scp or rsync on the terminal or an SFTP app like FileZilla).

    To generate an utterance without playing it (e.g. if you are using ssh to connect to the AT lab servers), use the following steps instead of SayText:

    festival> (set! myutt (Utterance Text "Hello"))
    festival> (utt.synth myutt)

    Then you can save the utterance myutt a wave file as “myutt.wav” with the following command:

    festival> (utt.save.wave myutt "myutt.wav" 'riff)

    When you issue a command to Festival you must put it in round brackets (...) – if you do not, it will generate an error. You are using a language called Scheme.

    Scheme, and lots of brackets

    Scheme is a LISP-like language used as the interface to Festival. When you run Festival in interactive mode, you talk to Festival in Scheme. Fortunately, we’re not going to have to learn too much Scheme. All you need to know for now is that the basic syntax is (function_name argument1 argument2 ...).

    In Scheme, all functions return a value, which by default is printed after the function completes. The SayText function returns an Utterance structure, so that is why # is printed after the completion of the function. A variable can be set to capture this return value, which will allow us to examine the utterance after processing. This is done using the set! command (note the two sets of brackets):

    festival> (set! myutt (SayText "Welcome to Festival"))
    #
    

    Save the utterance to a wav file

    You can save the generated waveform to a file as follows:

    festival>(utt.save.wave myutt "myutt.wav" 'riff)
    

    Here myutt should be the name of the utterance object (set above in next to the SayText command), myutt.wav is the filename, which you can choose; if you save more than one waveform, then give them different names. You can now view and analyse the waveform in Praat or Wavesurfer.

    The TTS process

    We can now examine the contents of the myutt variable. The SayText function is a high level function which calls a number of other functions in a chain. Each of these functions performs a specific job, such as deciding the pronunciation of a word, or how long a phone should be. We’ll be running these step-by-step later on.

    The TTS process in Festival is a pipeline of sub-processes, which build up an Utterance structure in stages. This building process takes the original text as input and adds more and more information, which is stored in the utterance structure. In Festival, a unified mechanism for representing all types of data needed by the system has been developed: this is called the Heterogeneous Relation Graph system, or HRG for short.

    Each Relation in an HRG is a structure that links items of a particular linguistic type. For example, we have a Word relation which is a list linking all the words, and a Segment relation which links all the phones etc. Relations can take different forms: the most common types are linear lists and trees.

    Each module in Festival takes a number of relations as input and either creates new relations as output, or modifies the input ones. The vast majority of modules only write new information, leaving all information in the input untouched (there are a few exceptions, such as post-lexical processing). Because of this, examining the contents of the relations in an utterance after processing gives an insight into the history of the TTS process.

    Different configurations of Festival can use vary with respect to their use of HRGs, and which modules they call.

    Examining a saved object

    Once you have synthesised an utterance you can do lots of things with it. Here are a few examples.

    festival> (utt.play myutt)
    festival> (utt.relationnames myutt)
    festival> (utt.relation.print myutt 'Word)
    festival> (utt.relation.print myutt 'Segment)
    

    You can get a list of the relations that are present in a synthesised utterance by using the utt.relationnames command.

    Relations that are lists can easily be printed to the screen with the utt.relation.print command. Try this with all of the relations in an utterance. Some of them won’t reveal useful information, others will.

    The output from (utt.relation.print myutt 'Word) may look like this:

    ()
    id _3 ; name hello ; pos_index 16 ; pos_index_score 0 ; pos uh ;
            phr_pos uh ; phrase_score -13.43 ; pbreak_index 1 ;
            pbreak_index_score 0 ; pbreak NB ;
    id _4 ; name world ; pos_index 8 ; pos_index_score 0 ; pos nn ;
            phr_pos n ; pbreak_index 0 ; pbreak_index_score 0 ;
            pbreak B ; blevel 3 ;
    nil
    

    Each data line starts with an id number like id _3 then a series of features follow separated by semicolons. Each feature has a name and a value, e.g., feature name: pos, feature value: uh.

    Examining the processing steps

    Tokens – First the text is split into Tokens. Look at the Token relation, where an item is created for each component of the text you input. The Token relation will still have digits and abbreviations in it.

    Words – The Tokens are then converted to Words, abbreviations and digits are processed and expanded. Look for this in the Word relation.

    Part of Speech Tagging – Each word is tagged with its part of speech, which is added as a feature to the Word relation.

    Pronunciation – The pronunciation of each word is determined and the Syllable and Segment relations created. Examine these: the syllable relation is not very interesting as there is very little information here, just a count of the syllables.

    You can look up the pronunciation of a word yourself with the function lex.lookup

    festival> (lex.lookup "caterpillar")
    ("caterpillar" nil (((k ae t) 1) ((ax p) 0) ((ih l) 1) ((er) 0)))
    

    The actual pronunciation returned depends on which lexicon a particular voice uses, and whether the word is in the lexicon or if Festival has to predict the pronunciation using letter-to-sound rules.

    Try looking up the pronunciation of some real words, and some made up ones.

    Accent Prediction – An intonation module assigns pitch accents (and other intonational events) to syllables. A number of different modules exist within Festival, operating with a number of intonation models including Tobi and Tilt. The assignment voice (‘awb’) doesn’t actually do accent prediction, but you can see what this would look like by trying the older diphone synthesis voice, kal, which does:

    To load the kal voice, enter the following in festival:

    festival>(voice_kal_diphone)
    

    Now, look at the IntEvent relation to see which pitch events have been assigned. From the pitch events and the predicted durations, a pitch contour is generated. This contour is a list of numbers which specify the pitch at each point for the resulting waveform. There is no easy way to view the pitch contour.

    To change back to the assignment voice:

    festival>(voice_cstr_edi_awb_arctic_multisyn)
    

    Waveform generation – The Unit relation is created by making a list of diphones from the segments and the information about the speech needed for synthesis is copied in. The Unit relation contains features with values in square brackets [...] These are references to the actual speech used to synthesise these units.

    Quit Festival

    festival> (quit)
    

    or use ctrl-D, just like in the bash shell. Festival remembers your command history inbetween sessions (again, just like bash). Next time you start Festival you can use the up cursor key to find previous command, and then hit ‘Enter’ to execute them again. Of course, Festival does not remember the values of variables (e.g., myutt in the above example) between sessions.

    What you should now be able to do

    • start Festival and make it speak using SayText
    • capture the Utterance structure returned by SayText
    • look inside the Utterance structure at the Relations
    • have an initial understanding of what Relations are, but not yet the full picture
    • use some of the keyboard shortcuts that are common to Festival and the bash shell
  2. Step-by-step
    It's possible to run each step in the text-to-speech pipeline manually, and inspect what Festival does at each point.

    We are going to examine the speech synthesis process in detail. You will examine the sequence of processes Festival uses to perform text-to-speech (TTS) and relate these processes to what you have learnt in the Speech Processing course. Make notes in a lab book as you work! Remember that Festival is just one example of a TTS system – there are generally many other ways of implementing each step.

    Festival stores information like words, phonemes, F0 targets, syntax, etc. in things called Relations. These approximately correspond to the levels of linguistic representation mentioned in lectures. As each stage in the process is performed, more information is added to the utterance structure, and you can examine this inside Festival. So, each step in the pipeline will either add a new relation, or modify an existing one.

    Your task in this part of the exercise is to explore the synthesis process and discover what Festival does in each step of processing, when converting text to speech. If you notice errors in the prediction of phrases, pronunciation, the processing of numbers or anything else, make a note of it, as this will be useful for the next part of the exercise.

    Hints

    Three hints for this exercise:

    1. Read the instructions through completely before you start.
    2. Use Festival’s tab-completion and command-line history (which is kept even if you quit and restart Festival) to save typing and avoid mistakes.
    3. If things go wrong (either with Festival, or with you), quit Festival and restart it.

    Festival help

    Festival can make your life easier in a number of ways.

    Command history

    You can access commands you have previously typed using the arrow keys. Press the up arrow a number of times to see the previous commands you entered, then use the left and right arrow keys to edit them. Press ENTER to run the edited command.

    TAB completion

    If you start to type a command name into Festival, then press TAB, it will either complete the command for you or give you a list of possible completions. For example, to get a list of all of the commands that work on an utterance type

    festival> (utt.
    

    and then press TAB once or twice.

    Getting help

    Most commands in Festival have built-in help. Type the name of the command (including the initial open bracket) and the press ⟨ALT⟩-h (Hold the ALT key down and press h), or alternatively, press (and release) ESC and then press h. Festival will print some help for that command, including a list of arguments that it expects.

    Starting Festival

    We are going to use a unit selection voice, called cstr_edi_awb_arctic_multisyn in this exercise.

    Before you start, make a folder to work in, e.g. on the remote desktop you can make a directory in ~/Documents/sp/assignment1. We will assume this is your working directory in the rest of these instructions.

    cd ~
    mkdir -p ~/Documents/sp/assignment1
    

    Change into this directory, e.g.:

    cd ~/Documents/sp/assignment1

    Remember you can use the pwd command to see which directory you are currently in and the ls command to see which files are in the current directory.

    We’re going to use a simple configuration file to tell Festival to load the correct voice, and to add a few extra commands (file location of the AT lab servers):

    cp /Volumes/Network/courses/sp/assignment1/config.scm .
    chmod ugo+r config.scm
    

    If you followed this gist to install festival on your own computer, you probably already downloaded the config.scm file (by default the “assignment1” directory next to the “tools” directory you installed festival (see lines 82-85). In that case, you can start festival from there or move the config.scm file to the directory you want to work in.

    Remember that if you come back later, you only need to cd to your working direcotry (e.g. ~/Documents/sp/assignment1 following the remote desktop instructions). You don’t need to copy the file again as long as you start festival from a directory that contains config.scm. Now, every time you start Festival during the rest of this exercise, do it like this.

    festival config.scm
    

    Compared to the earlier exercises using a diphone voice, Festival will take longer to start when loading this unit selection voice. Why?

    In the following festival> at the beginning of the line, just recreates the festival command line prompt – you just need to type in the bit in parentheses.

    Once Festival is running, check that it speaks:

    festival> (SayText "hello world")
    

    You should hear a reasonably good quality Scottish male voice. If not, you probably forgot to start festival using the config.scm file.

    Synthesising an utterance step-by-step

    Read this section carefully before trying any of the commands.

    So far we have only synthesised utterances from start to finish in one go, using SayText. Now we are going to do it step-by-step.
    First you need to create a new utterance object. The following command creates a new utterance object with the text supplied and sets the variable myutt to point to it. In the following festival> at the beginning of the line, just recreates the festival command line prompt – you just need to type in the bit in parentheses.

    festival> (set! myutt (Utterance Text "Put your own text here."))
    

    Now you can manually run each step of the text-to-speech pipeline – don’t skip any steps (what would happen if you did?). Use a single short utterance of your own when performing this part – make it interesting (e.g., containing some text that needs normalising, a word that is unlikely to be in the dictionary, and so on).

    festival> (Initialize myutt)
    festival> (Text myutt)
    festival> (Token_POS myutt)
    festival> (Token myutt)
    festival> (POS myutt)
    festival> (Phrasify myutt)
    festival> (Word myutt)
    festival> (Pauses myutt)
    festival> (PostLex myutt)
    festival> (Wave_Synth myutt)
    festival> (utt.play myutt)
    

    If you get an error, you will have to start again by creating a new utterance with the set! command. If you get confused, quit Festival and start from the beginning again.

    Note that running the synthesis pipeline step-by-step is just to help you understand what is happening. You might need it to diagnose some mistakes later on, but most of the time, you will just use SayText.

    Commands for examining utterances

    You should pause to examine the contents of myutt between each step.

    To determine which relations are now present:

    festival> (utt.relationnames myutt)
    

    and to examine a particular Relation (if it exists):

    festival> (utt.relation.print myutt 'Phrase)
    festival> (utt.relation.print myutt 'Word)
    festival> (utt.relation.print myutt 'Segment)
    

    and so on for any other Relations that exist.

    You can also use the following command to see the overall structure of the utterance:

    festival> (print_sylstructure myutt)
    

    This will show you how the different relations tie together. It will show you Words, Syllables as lists of segments, and the presence of stress.

    Concentrate on discovering which commands create or modify Relations in the utterance structure, and what information is stored in those Relations. Note: the initialize command will not reveal anything interesting, and it may be difficult to see what the Pauses and PostLex commands do.

    What you should now be able to do

    • start Festival and load a configuration file (which is just a sequence of Scheme commands to run after startup)
    • Make full use of keyboard shortcuts including: TAB completion, ctrl-A, ctrl-E, ctrl-P, ctrl-N, ctrl-R, up/down cursor keys to navigate the command history, left/right cursor keys to edit a command.
    • run the pipeline step-by-step
    • describe which Relations are added, or modified, by each step
    • understand that Relations are composed of Items
    • understand that Items are data structures containing an unordered set of key-value pairs
    • have an initial understanding of what some (but not all) of the keys and values mean (e.g., POS tags in the Word relation)
  3. Finding mistakes
    Festival makes mistakes, of course. Your task is to find interesting ones, and explain why each occurs.

    Starting Festival

    Every time you start Festival during this exercise, do it like this, remembering to first change to the directory where you placed the config.scm file:

    $ festival config.scm
    

    Saving waveforms from Festival

    Once you have a fully synthesised utterance object in Festival, it is possible to extract the waveform to a file as follows:

    festival>(utt.save.wave myutt "myutt.wav" 'riff)
    

    myutt should be the name of the utterance object, myutt.wav is the filename, which you can choose; if you save more than one waveform, then give them different names. You can now view and analyse the waveform in Praat or Wavesurfer.

    Explaining mistakes made by Festival

    Using what you have learned about Festival, you can now find some examples of it making errors for English text-to-speech. Find examples in each of the following categories:

    • text normalisation
    • POS tagging/homographs
    • phrase break prediction
    • pronunciation (dictionary or letter-to-sound)
    • waveform generation

    Aim for a variety of different types of errors, with different underlying causes: 1 error for each of the front-end categories, 2 errors for waveform generation (see the marking scheme). Don’t report lots of errors of the same type. Be sure that you understand the differences between these various types of error. For example, when Festival says a word incorrectly, it might not be a problem with the pronunciation components (dictionary + letter-to-sound) – it could be a problem earlier or later in the pipeline. You need to play detective and be precise about the underlying cause of every error you report.

    You might discover errors in other categories too. That’s fine: you can report and explain those as well.

    Use the SayText command to synthesise lots of text (e.g., from a news website, or constructed by you). Store the results in a variable. You will need to examine the contents of this utterance structure to decide what type each error is.

    In some cases (but not all – it will be too slow), you may also need to run Festival step-by-step (as in the previous part of the exercise).

    Skills to develop in this assignment

    • use SayText to synthesise lots of different sentences
    • precisely pinpoint audible errors in the output (e.g., which word, syllable or phone)
    • understand that errors made by later steps in the pipeline might be caused by erroneous input; in other words, the mistake actually happened earlier in the pipeline
    • understand that mistakes can happen in both the front end and the waveform generator
    • make a hypothesis about the cause of the mistake
    • trace back through the pipeline to find the earliest step and which something went wrong
    • obtain evidence to test your hypothesis, by inspecting the Utterance and/or the waveform and/or the spectrogram
  4. Writing up
    You're going to write up a lab report, analyzing errors made by a specific Festival TTS configuration
    Log in
    1. Lab report
      Write up your findings from the exercise in a lab report, to show that you can make connections between theory and practice.

      You should write a lab report about the two parts of the synthesis practical (“Step-by-step” and “Finding mistakes”). Keep it concise and to the point, but make sure you detail your findings using Festival. The lab report should have a clear structure and be divided into sections and subsections.

      What exactly is meant by “lab report”?

      It is not a discursive essay. It is not merely documentation of what you typed into Festival and what output you got. It is a factual report of what you did in the lab that demonstrates what you learnt and how you can relate that to the theory from lectures. You will get marks for:

      • completing all parts of the practical, and demonstrating this in the report
      • providing interesting errors made by Festival, with correct analysis of the type of error and of which module made the error
      • a clear demonstration that you understand the difference between human speech production and the methods used by Festival to generate speech. Why do the methods employed by Festival cause problems at times? What benefits could using a TTS system bring?
      • Describing the theory behind each module (what is it trying to achieve?), linking that to practice (what does it actually do?) and analysing errors in terms of the underlying techniques used in Festival. Feel free to be critical when necessary – Festival is not perfect and the voice we have given you to analyse is certainly not the best we could make with Festival! One way to show your critical thinking skills is to suggest how the mistake could be avoided: which part of Festival would need to be improved, and how?
      • clear and concise writing, and effective use of diagrams and examples. A good diagram will almost always allow you to write less text.

      How much background material should there be?

      Do not spend too long simply restating material from lectures or textbooks without telling the reader why you are doing this. Do provide enough background to demonstrate your understanding of the theory, and to support your explanations of how Festival works and your error analysis. If you make claims that are not drawn for the lecture material (e.g. specific phonetic/phonological phenomena or methods used in by Festival), use specific and carefully chosen citations. You can cite textbooks and papers. You do not need to cite research papers in this class, but you may get more marks for relevant use of citations that directly help you explain your analysis. Avoid citing lecture slides or videos. Make sure everything you cite is correctly listed in the bibliography.

      You do not have to explain algorithms we have not gone over in this class (e.g. specific POS tagging methods) unless you specifically want to include an analysis/explanation of why they cause the specific errors you are analysing.

      In the background section you should:

      • Briefly outline of human speech production: just enough to contrast with what Festival is doing
      • Describe what Festival should be doing in theory 
      • Describe what it does in practice

      Have a look at structured marking scheme to get an idea of other parts of the TTS pipeline you should make sure you cover.

      Writing style

      The writing style of the report should be similar to that of a journal paper. Don’t list every command you typed! Say what you were testing and why, what your input to Festival was, what Festival did and what the output was. Use diagrams (e.g., to explain parts of the pipeline, or to illustrate a linguistic structures) and annotated waveform and spectrogram plots to illustrate your report. It may not be appropriate to use a waveform or spectrogram to illustrate a front-end mistake. Avoid using verbatim output copied from the Terminal, unless this is essential to the point you are making.

      Additional tips

      Give the exact text you asked Festival to synthesise, so that the reader/marker can reproduce the mistakes you find in Festival (this includes punctuation!). Always explain why each of the mistakes you find belongs in a particular category. For example, differentiate carefully between

      • part of speech prediction errors that cause the wrong entry in the dictionary to be retrieved
      • errors in letter-to-sound rules
      • waveform generation problems (e.g. an audible join)

      Since the voice you are using in Festival is Scottish English, it is only fair to find errors for that variety of English, so take care with your spelling of specific input texts! You may find it helpful to listen to some actual speech from the source: Prof Alan Black. Quite conveniently for us, you can in fact listen to Alan Black talk about TTS to study his voice and the subject matter at the same time!

    2. Tips on writing
      These apply to both the lab report. There are tips about both content and writing style.

      1. Content of the lab report

      These tips are provided as a set of slides. They concentrate on content. There is no video for this topic.

      2. Scientific Writing – Style and presentation

      There are slides for this video.

      3. Scientific Writing – Saying more, in less space, and with fewer words

      There are slides for this video.

      4. Scientific Writing – figures, graphs, tables, diagrams, …

      There are slides for this video.

      5. Avoiding ambiguity

      These tips are provided as a set of notes. They concentrate on the word “this”. There is no video for this topic.


      Related forum

      Viewing 15 topics - 1 through 15 (of 29 total)
      Viewing 15 topics - 1 through 15 (of 29 total)
      • You must be logged in to create new topics.
    3. Formatting instructions
      Specification of word limits and other rules that you must follow, plus the structured marking scheme.

      You must:

      • submit a single document in PDF format. When submitting to Learn, the electronic submission title must be in the format “exam number_lab report wordcount” and nothing else (e.g., “B012345_1857+438”)
      • state your exam number at the top of every page of the document
      • state the word count below the title on the first page in the document (e.g., “word count: 1857”)
      • use a line spacing of 1.5 and a minimum font size of 11pt (and that applies to all text, including within figures, tables, and captions)

      Marking is strictly anonymous. Do not include your name or your student number – only provide your exam number!

      Structure

      • Word limit: 2000 words, excluding bibliography but including all other text (such as headings, footnotes, text within figures & tables, captions, examples). Numerical data and phonetic transcriptions do not count as text.
        • The +/- 10% rule applies to this word limit, but if you go over 2000 words, try to really think about whether the extra words are really enhancing your report (otherwise your markers may not thank you!)
      • Page limit: no limit, large margins are fine but avoid blank pages
      • Figures & tables: no limit on number

      Structure of the lab report

      You must use the following structure and section numbering for the lab report:

      • 1 Introduction
      • 2 Background
      • 3 Finding and explaining mistakes
        • 3.1 Text normalisation
        • 3.2 POS tagging
        • 3.3 Phrase break prediction
        • 3.4 Pronunciation
        • 3.5 Waveform generation
        • 3.6 Other types of mistakes
      • 4 Discussion and conclusion

      You may add further subsections below any of the above, if you wish (e.g., to divide your Background section into subsections of your own choosing).

      Figure, graphs and tables

      You should ensure that figures and graphs are large enough to read easily and are of high-quality (with a very strong preference for vector graphics, and failing that high-resolution images). You are strongly advised to draw your own figures which will generally attract a higher mark than a figure quoted from another source. Tables must have column or row headers, as appropriate.

      There is no page limit, so therefore there is no reason to have very small figures.

      Your work may be marked electronically or we may print hardcopies on A4 paper, so it must be legible in both formats. In particular, do not assume the markers can “zoom in” to make the figures larger.

      And finally,…

      Marking scheme

      You are strongly advised to read the marking scheme because it will help you focus your effort and decide how much to write in each section of your report.

      Here is the structured marking sheet for this assignment (this is the version for 2022-23)


Forums for this assignment

(Only accessible if you log in)

Private

  • You do not have permission to view this forum.