This assignment has updated for Speech Processing 2023-24. If you are taking the course in 2023-24 (or later) don’t follow these instructions!
The milestones will help you stay on track with this assignment. Try to stay ahead of the milestones.
Log in- Getting startedA first look at Festival and how we use it in interactive mode on the command line.
Accessing Festival
The instructions assume you are using the installation of Festival on the computers in the PPLS Appleton Tower (AT) labs. You can work on the computers in the actual physical lab or you can use the the remote desktop service (see Module 0 for instructions). Note: The PPLS AT Lab remote desktop we’re using for this course is a separate thing from the Informatics remote desktop service!
The other option is to install Festival directly onto your computer. This will require a unix like environment (Linux or MacOS, or a Linux style terminal runnning on Windows) and require you to compile some code (some install instructions are linked here).
If you’ve never compiled code before, and don’t have much experience with the unix command line, your best bet is to use the PPLS AT Lab computers.
Accessing Festival Remotely
You can use the installation of Festival on the Appleton Tower lab servers using the remote desktop service.
To connect using the remote desktop, follow the instructions here: Module 0 – computing requirements
Once you’ve started the remote desktop and logged in (with your UUN and EASE password), you can open the Terminal app by going to the menu bar at the top and clicking Applications > System Tools > Terminal. You may want to drag the Terminal icon to the desktop to make it easier for you to find it.
If you accidentally close VNC before you log out, you can reconnect by double clicking on the machine you previously logged onto in VNC viewer.
When you are finished, remember to log out: in the menu at the top of the screen > System > Log out
Transferring data from the AT lab servers
To get your data (e.g. generated wav files) from the AT lab servers (e.g. remote desktop) you can either use a terminal based command like
rsync
or an SFTP client like FileZilla (graphical interface). For example, to following copies the file myutt.wav that’s in ~/Documents/sp/assignment1 to the directory I’m running the rsync command from on my own computer (don’t run this yet!):rsync -avz s1234567@scp1.ppls.ed.ac.uk:Documents/sp/assignment1/myutt.wav ./
Note: The previous command will only work if you’ve already made the directories Documents/sp/assignment1 in your home directory on the AT lab servers. If you haven’t done this, you can skip this for now and try it after you’ve created some files.
You can share your files with yourself by copying them to OneDrive (or Google Drive) via a browser.
You can also use a file transfer app like FileZilla. In this case, you need to set the remote host to
scp1.ppls.ed.ac.uk
. For FileZilla, go to File > Site Manager, then set the protocol to SFTP, the host as scp1.ppls.ed.ac.uk, and use your UUN as username and EASE password as the password. After connecting you should see your home directory on the AT lab servers as the remote site. You can then drag files from remote site side to the appropriate place in the local site side.Installing Festival Yourself
It should be fine (and probably easier) to just use the PPLS AT lab machines in-person or using the remote desktop for this assignment.
You can also try installing festival yourself on your own computer (Linux or MacOS only). This is only recommended if you already have some experience compiling programs from source code (or you’re willing to spend some time learning about this!). Otherwise, we suggest you use the remote desktop option (or come to the actual lab).
You can get the source code from the Festvox website or the festvox github page.
Macbook: Here’s a gist outlining the steps Catherine used to install Festival on her Macbook (run the commands in the terminal): Link to gist. Also see these forum posts:
- https://speech.zone/forums/topic/festival-error-setting-input-audio-stream-format
- i.e., you may need to install sox to play audio and add some lines to the
config.scm
file mentioned in in Step-by-Step
- i.e., you may need to install sox to play audio and add some lines to the
- https://speech.zone/forums/topic/installing-festival-on-a-newer-mac-e-g-missing-makefile
This gist also gives direct download links for the relevant zipped code archives. Please note, that you are required to use a specific voice for this assignment (cstr_edi_awb_arctic_multisyn). The data required for these voices is not publicly available, so you will need to download this from the PPLS AT lab servers (see Assignment Data below and gist link).
To install Festival from you will generally need to be able to compile C++ code. If you’ve never done this before, this may be rather daunting and you may need to install extra tools (e.g. Xcode command line tools for MacOS). If you use Ubuntu, you can also install it using the apt package manager (e.g.
apt-get install festival
or possibly justapt install festival
).Ubuntu Linux: See instructions in this gist for installing Festival on Ubuntu. Some issues have been reported installing Festival from source on Ubuntu (as in the gist above). You can find some notes and an alternative way to do this using the Ubuntu package manager here.
Windows: If you have a windows machine and but want to install Festival on your computer, you might like to try the Ubuntu on Windows. See also Ubuntu on WSL. See the Ubuntu instructions below.
You can also have a look in the forums for other tips on how to do this!
Just another reminder though that you don’t have to install Festival yourself! It’s fine just to use the lab machines!
Assignment Data
If you are using the remote desktop to access the AT lab computers (or are physically in the lab!), all the relevant data is already there for you on the linux machines.
If you aren’t using the remote desktop to access the AT lab computers, you will need to get the voice database and dictionaries used to run the voice (voice:cstr_edi_awb_arctic_multisyn, dictionary:unilex). You can find instructions by following this link.
Start Festival
Festival has a command line interface runs through the terminal (i.e. the unix bash shell). To do this in the PPLS AT lab, you’ll need to (i) make sure the computer is booted into Linux (if it is in windows restart the machine and select the penguin (the Linux mascot!) when presented with the choice); (ii) open a terminal via Applications > System Tools > Terminal from the menu bar at the top left of the screen. You can drag the Terminal icon from the menu to the desktop if you want to make a shortcut.
So, now, open a Terminal and run Festival by typing in
festival
at the prompt:$ festival Festival Speech Synthesis System 2.5.0:release December 2017 Copyright (C) University of Edinburgh, 1996-2010. All rights reserved. ..etc
and the prompt will change
festival>
This new prompt means that Festival is running; any commands that you type will now be interpreted by Festival rather than by the shell.
You will be pleased to know that Festival’s command-line interface is the same as the bash shell (e.g.,
TAB
completion,ctrl-a
,ctrl-e
,ctrl-p
,ctrl-n
, up/down/left/right cursor keys, etc.). Here’s a nice cheat sheet for common bash commands. For a comprehensive list of these shortcuts, see the Wikipedia entry for GNU Readline.If you get into trouble at any point and need to exit the current command, use
ctrl-c
. This applies to both Festival and the bash shell.It’s really worth learning these keyboard shortcuts because they also apply to the bash shell and will save you a lot of time.
Make Festival speak
Synthesise some sentences to become familiar with to the Festival command line.
Festival contains a number of different synthesis engines and for each of these, several voices are available: the quality of synthesis is highly dependent on the particular engine and voice that is being used.
By default, Festival will start with a rather old diphone voice, which does not sound great, but is fast and good enough for now:
festival> (set! myutt (SayText "Welcome to Festival"))
Note: If you installed Festival onto your own computer and followed the instructions to download the assignment voice onto your own computer (i.e., you downloaded the directory
cstr_edi_awb_arctic_multisyn but not the other voices), you’ll be missing the default (i.e., diphone synthesizer, kal) voice. You can set the voice to the one we will use in the assignment by typing the following after starting festival:
(voice_cstr_edi_awb_arctic_multisyn)
See the instructions in “Assignment Data” above for a command to download the other voices.
You won’t be able to hear anything if you access Festival using ssh! You’ll need to save your generated utterance as a wav file (see below) and copy your files from the AT lab servers to your own computer (e.g. using scp or rsync on the terminal or an SFTP app like FileZilla).
To generate an utterance without playing it (e.g. if you are using ssh to connect to the AT lab servers), use the following steps instead of SayText:
festival> (set! myutt (Utterance Text "Hello"))
festival> (utt.synth myutt)
Then you can save the utterance myutt a wave file as “myutt.wav” with the following command:
festival> (utt.save.wave myutt "myutt.wav" 'riff)
When you issue a command to Festival you must put it in round brackets
(...)
– if you do not, it will generate an error. You are using a language called Scheme.Scheme, and lots of brackets
Scheme is a LISP-like language used as the interface to Festival. When you run Festival in interactive mode, you talk to Festival in Scheme. Fortunately, we’re not going to have to learn too much Scheme. All you need to know for now is that the basic syntax is
(function_name argument1 argument2 ...)
.In Scheme, all functions return a value, which by default is printed after the function completes. The
SayText
function returns anUtterance
structure, so that is why#
is printed after the completion of the function. A variable can be set to capture this return value, which will allow us to examine the utterance after processing. This is done using theset!
command (note the two sets of brackets):festival> (set! myutt (SayText "Welcome to Festival")) #
Save the utterance to a wav file
You can save the generated waveform to a file as follows:
festival>(utt.save.wave myutt "myutt.wav" 'riff)
Here
myutt
should be the name of the utterance object (set above in next to the SayText command),myutt.wav
is the filename, which you can choose; if you save more than one waveform, then give them different names. You can now view and analyse the waveform in Praat or Wavesurfer.The TTS process
We can now examine the contents of the
myutt
variable. TheSayText
function is a high level function which calls a number of other functions in a chain. Each of these functions performs a specific job, such as deciding the pronunciation of a word, or how long a phone should be. We’ll be running these step-by-step later on.The TTS process in Festival is a pipeline of sub-processes, which build up an Utterance structure in stages. This building process takes the original text as input and adds more and more information, which is stored in the utterance structure. In Festival, a unified mechanism for representing all types of data needed by the system has been developed: this is called the Heterogeneous Relation Graph system, or HRG for short.
Each Relation in an HRG is a structure that links items of a particular linguistic type. For example, we have a Word relation which is a list linking all the words, and a Segment relation which links all the phones etc. Relations can take different forms: the most common types are linear lists and trees.
Each module in Festival takes a number of relations as input and either creates new relations as output, or modifies the input ones. The vast majority of modules only write new information, leaving all information in the input untouched (there are a few exceptions, such as post-lexical processing). Because of this, examining the contents of the relations in an utterance after processing gives an insight into the history of the TTS process.
Different configurations of Festival can use vary with respect to their use of HRGs, and which modules they call.
Examining a saved object
Once you have synthesised an utterance you can do lots of things with it. Here are a few examples.
festival> (utt.play myutt) festival> (utt.relationnames myutt) festival> (utt.relation.print myutt 'Word) festival> (utt.relation.print myutt 'Segment)
You can get a list of the relations that are present in a synthesised utterance by using the
utt.relationnames
command.Relations that are lists can easily be printed to the screen with the
utt.relation.print
command. Try this with all of the relations in an utterance. Some of them won’t reveal useful information, others will.The output from
(utt.relation.print myutt 'Word)
may look like this:() id _3 ; name hello ; pos_index 16 ; pos_index_score 0 ; pos uh ; phr_pos uh ; phrase_score -13.43 ; pbreak_index 1 ; pbreak_index_score 0 ; pbreak NB ; id _4 ; name world ; pos_index 8 ; pos_index_score 0 ; pos nn ; phr_pos n ; pbreak_index 0 ; pbreak_index_score 0 ; pbreak B ; blevel 3 ; nil
Each data line starts with an id number like
id _3
then a series of features follow separated by semicolons. Each feature has a name and a value, e.g., feature name:pos
, feature value:uh
.Examining the processing steps
Tokens – First the text is split into Tokens. Look at the Token relation, where an item is created for each component of the text you input. The Token relation will still have digits and abbreviations in it.
Words – The Tokens are then converted to Words, abbreviations and digits are processed and expanded. Look for this in the Word relation.
Part of Speech Tagging – Each word is tagged with its part of speech, which is added as a feature to the Word relation.
Pronunciation – The pronunciation of each word is determined and the Syllable and Segment relations created. Examine these: the syllable relation is not very interesting as there is very little information here, just a count of the syllables.
You can look up the pronunciation of a word yourself with the function lex.lookup
festival> (lex.lookup "caterpillar") ("caterpillar" nil (((k ae t) 1) ((ax p) 0) ((ih l) 1) ((er) 0)))
The actual pronunciation returned depends on which lexicon a particular voice uses, and whether the word is in the lexicon or if Festival has to predict the pronunciation using letter-to-sound rules.
Try looking up the pronunciation of some real words, and some made up ones.
Accent Prediction – An intonation module assigns pitch accents (and other intonational events) to syllables. A number of different modules exist within Festival, operating with a number of intonation models including Tobi and Tilt. The assignment voice (‘awb’) doesn’t actually do accent prediction, but you can see what this would look like by trying the older diphone synthesis voice, kal, which does:
To load the kal voice, enter the following in festival:
festival>(voice_kal_diphone)
Now, look at the
IntEvent
relation to see which pitch events have been assigned. From the pitch events and the predicted durations, a pitch contour is generated. This contour is a list of numbers which specify the pitch at each point for the resulting waveform. There is no easy way to view the pitch contour.To change back to the assignment voice:
festival>(voice_cstr_edi_awb_arctic_multisyn)
Waveform generation – The Unit relation is created by making a list of diphones from the segments and the information about the speech needed for synthesis is copied in. The
Unit
relation contains features with values in square brackets[...]
These are references to the actual speech used to synthesise these units.Quit Festival
festival> (quit)
or use
ctrl-D
, just like in the bash shell. Festival remembers your command history inbetween sessions (again, just like bash). Next time you start Festival you can use the up cursor key to find previous command, and then hit ‘Enter’ to execute them again. Of course, Festival does not remember the values of variables (e.g.,myutt
in the above example) between sessions.What you should now be able to do
- start Festival and make it speak using
SayText
- capture the Utterance structure returned by
SayText
- look inside the Utterance structure at the Relations
- have an initial understanding of what Relations are, but not yet the full picture
- use some of the keyboard shortcuts that are common to Festival and the bash shell
- https://speech.zone/forums/topic/festival-error-setting-input-audio-stream-format
- Step-by-stepIt's possible to run each step in the text-to-speech pipeline manually, and inspect what Festival does at each point.
We are going to examine the speech synthesis process in detail. You will examine the sequence of processes Festival uses to perform text-to-speech (TTS) and relate these processes to what you have learnt in the Speech Processing course. Make notes in a lab book as you work! Remember that Festival is just one example of a TTS system – there are generally many other ways of implementing each step.
Festival stores information like words, phonemes, F0 targets, syntax, etc. in things called Relations. These approximately correspond to the levels of linguistic representation mentioned in lectures. As each stage in the process is performed, more information is added to the utterance structure, and you can examine this inside Festival. So, each step in the pipeline will either add a new relation, or modify an existing one.
Your task in this part of the exercise is to explore the synthesis process and discover what Festival does in each step of processing, when converting text to speech. If you notice errors in the prediction of phrases, pronunciation, the processing of numbers or anything else, make a note of it, as this will be useful for the next part of the exercise.
Hints
Three hints for this exercise:
- Read the instructions through completely before you start.
- Use Festival’s tab-completion and command-line history (which is kept even if you quit and restart Festival) to save typing and avoid mistakes.
- If things go wrong (either with Festival, or with you), quit Festival and restart it.
Festival help
Festival can make your life easier in a number of ways.
Command history
You can access commands you have previously typed using the arrow keys. Press the up arrow a number of times to see the previous commands you entered, then use the left and right arrow keys to edit them. Press
ENTER
to run the edited command.TAB completion
If you start to type a command name into Festival, then press
TAB
, it will either complete the command for you or give you a list of possible completions. For example, to get a list of all of the commands that work on an utterance typefestival> (utt.
and then press
TAB
once or twice.Getting help
Most commands in Festival have built-in help. Type the name of the command (including the initial open bracket) and the press
⟨ALT⟩-h
(Hold theALT
key down and pressh
), or alternatively, press (and release)ESC
and then pressh
. Festival will print some help for that command, including a list of arguments that it expects.Starting Festival
We are going to use a unit selection voice, called
cstr_edi_awb_arctic_multisyn
in this exercise.Before you start, make a folder to work in, e.g. on the remote desktop you can make a directory in
~/Documents/sp/assignment1
. We will assume this is your working directory in the rest of these instructions.cd ~ mkdir -p ~/Documents/sp/assignment1
Change into this directory, e.g.:
cd ~/Documents/sp/assignment1
Remember you can use the
pwd
command to see which directory you are currently in and thels
command to see which files are in the current directory.We’re going to use a simple configuration file to tell Festival to load the correct voice, and to add a few extra commands (file location of the AT lab servers):
cp /Volumes/Network/courses/sp/assignment1/config.scm . chmod ugo+r config.scm
If you followed this gist to install festival on your own computer, you probably already downloaded the config.scm file (by default the “assignment1” directory next to the “tools” directory you installed festival (see lines 82-85). In that case, you can start festival from there or move the config.scm file to the directory you want to work in.
Remember that if you come back later, you only need to
cd
to your working direcotry (e.g.~/Documents/sp/assignment1
following the remote desktop instructions). You don’t need to copy the file again as long as you start festival from a directory that contains config.scm. Now, every time you start Festival during the rest of this exercise, do it like this.festival config.scm
Compared to the earlier exercises using a diphone voice, Festival will take longer to start when loading this unit selection voice. Why?
In the following
festival>
at the beginning of the line, just recreates the festival command line prompt – you just need to type in the bit in parentheses.Once Festival is running, check that it speaks:
festival> (SayText "hello world")
You should hear a reasonably good quality Scottish male voice. If not, you probably forgot to start festival using the
config.scm
file.Synthesising an utterance step-by-step
Read this section carefully before trying any of the commands.
So far we have only synthesised utterances from start to finish in one go, using
SayText
. Now we are going to do it step-by-step.
First you need to create a new utterance object. The following command creates a new utterance object with the text supplied and sets the variablemyutt
to point to it. In the followingfestival>
at the beginning of the line, just recreates the festival command line prompt – you just need to type in the bit in parentheses.festival> (set! myutt (Utterance Text "Put your own text here."))
Now you can manually run each step of the text-to-speech pipeline – don’t skip any steps (what would happen if you did?). Use a single short utterance of your own when performing this part – make it interesting (e.g., containing some text that needs normalising, a word that is unlikely to be in the dictionary, and so on).
festival> (Initialize myutt) festival> (Text myutt) festival> (Token_POS myutt) festival> (Token myutt) festival> (POS myutt) festival> (Phrasify myutt) festival> (Word myutt) festival> (Pauses myutt) festival> (PostLex myutt) festival> (Wave_Synth myutt) festival> (utt.play myutt)
If you get an error, you will have to start again by creating a new utterance with the set! command. If you get confused, quit Festival and start from the beginning again.
Note that running the synthesis pipeline step-by-step is just to help you understand what is happening. You might need it to diagnose some mistakes later on, but most of the time, you will just use
SayText
.Commands for examining utterances
You should pause to examine the contents of
myutt
between each step.To determine which relations are now present:
festival> (utt.relationnames myutt)
and to examine a particular Relation (if it exists):
festival> (utt.relation.print myutt 'Phrase) festival> (utt.relation.print myutt 'Word) festival> (utt.relation.print myutt 'Segment)
and so on for any other Relations that exist.
You can also use the following command to see the overall structure of the utterance:
festival> (print_sylstructure myutt)
This will show you how the different relations tie together. It will show you Words, Syllables as lists of segments, and the presence of stress.
Concentrate on discovering which commands create or modify Relations in the utterance structure, and what information is stored in those Relations. Note: the initialize command will not reveal anything interesting, and it may be difficult to see what the Pauses and PostLex commands do.
What you should now be able to do
- start Festival and load a configuration file (which is just a sequence of Scheme commands to run after startup)
- Make full use of keyboard shortcuts including:
TAB
completion,ctrl-A
,ctrl-E
,ctrl-P
,ctrl-N
,ctrl-R
, up/down cursor keys to navigate the command history, left/right cursor keys to edit a command. - run the pipeline step-by-step
- describe which Relations are added, or modified, by each step
- understand that Relations are composed of Items
- understand that Items are data structures containing an unordered set of key-value pairs
- have an initial understanding of what some (but not all) of the keys and values mean (e.g., POS tags in the
Word
relation)
- Finding mistakesFestival makes mistakes, of course. Your task is to find interesting ones, and explain why each occurs.
Starting Festival
Every time you start Festival during this exercise, do it like this, remembering to first change to the directory where you placed the
config.scm
file:$ festival config.scm
Saving waveforms from Festival
Once you have a fully synthesised utterance object in Festival, it is possible to extract the waveform to a file as follows:
festival>(utt.save.wave myutt "myutt.wav" 'riff)
myutt
should be the name of the utterance object,myutt.wav
is the filename, which you can choose; if you save more than one waveform, then give them different names. You can now view and analyse the waveform in Praat or Wavesurfer.Explaining mistakes made by Festival
Using what you have learned about Festival, you can now find some examples of it making errors for English text-to-speech. Find examples in each of the following categories:
- text normalisation
- POS tagging/homographs
- phrase break prediction
- pronunciation (dictionary or letter-to-sound)
- waveform generation
Aim for a variety of different types of errors, with different underlying causes: 1 error for each of the front-end categories, 2 errors for waveform generation (see the marking scheme). Don’t report lots of errors of the same type. Be sure that you understand the differences between these various types of error. For example, when Festival says a word incorrectly, it might not be a problem with the pronunciation components (dictionary + letter-to-sound) – it could be a problem earlier or later in the pipeline. You need to play detective and be precise about the underlying cause of every error you report.
You might discover errors in other categories too. That’s fine: you can report and explain those as well.
Use the
SayText
command to synthesise lots of text (e.g., from a news website, or constructed by you). Store the results in a variable. You will need to examine the contents of this utterance structure to decide what type each error is.In some cases (but not all – it will be too slow), you may also need to run Festival step-by-step (as in the previous part of the exercise).
Skills to develop in this assignment
- use
SayText
to synthesise lots of different sentences - precisely pinpoint audible errors in the output (e.g., which word, syllable or phone)
- understand that errors made by later steps in the pipeline might be caused by erroneous input; in other words, the mistake actually happened earlier in the pipeline
- understand that mistakes can happen in both the front end and the waveform generator
- make a hypothesis about the cause of the mistake
- trace back through the pipeline to find the earliest step and which something went wrong
- obtain evidence to test your hypothesis, by inspecting the Utterance and/or the waveform and/or the spectrogram
- Writing upYou're going to write up a lab report, analyzing errors made by a specific Festival TTS configurationLog in
- Lab reportWrite up your findings from the exercise in a lab report, to show that you can make connections between theory and practice.
You should write a lab report about the two parts of the synthesis practical (“Step-by-step” and “Finding mistakes”). Keep it concise and to the point, but make sure you detail your findings using Festival. The lab report should have a clear structure and be divided into sections and subsections.
What exactly is meant by “lab report”?
It is not a discursive essay. It is not merely documentation of what you typed into Festival and what output you got. It is a factual report of what you did in the lab that demonstrates what you learnt and how you can relate that to the theory from lectures. You will get marks for:
- completing all parts of the practical, and demonstrating this in the report
- providing interesting errors made by Festival, with correct analysis of the type of error and of which module made the error
- a clear demonstration that you understand the difference between human speech production and the methods used by Festival to generate speech. Why do the methods employed by Festival cause problems at times? What benefits could using a TTS system bring?
- Describing the theory behind each module (what is it trying to achieve?), linking that to practice (what does it actually do?) and analysing errors in terms of the underlying techniques used in Festival. Feel free to be critical when necessary – Festival is not perfect and the voice we have given you to analyse is certainly not the best we could make with Festival! One way to show your critical thinking skills is to suggest how the mistake could be avoided: which part of Festival would need to be improved, and how?
- clear and concise writing, and effective use of diagrams and examples. A good diagram will almost always allow you to write less text.
How much background material should there be?
Do not spend too long simply restating material from lectures or textbooks without telling the reader why you are doing this. Do provide enough background to demonstrate your understanding of the theory, and to support your explanations of how Festival works and your error analysis. If you make claims that are not drawn for the lecture material (e.g. specific phonetic/phonological phenomena or methods used in by Festival), use specific and carefully chosen citations. You can cite textbooks and papers. You do not need to cite research papers in this class, but you may get more marks for relevant use of citations that directly help you explain your analysis. Avoid citing lecture slides or videos. Make sure everything you cite is correctly listed in the bibliography.
You do not have to explain algorithms we have not gone over in this class (e.g. specific POS tagging methods) unless you specifically want to include an analysis/explanation of why they cause the specific errors you are analysing.
In the background section you should:
- Briefly outline of human speech production: just enough to contrast with what Festival is doing
- Describe what Festival should be doing in theory
- Describe what it does in practice
Have a look at structured marking scheme to get an idea of other parts of the TTS pipeline you should make sure you cover.
Writing style
The writing style of the report should be similar to that of a journal paper. Don’t list every command you typed! Say what you were testing and why, what your input to Festival was, what Festival did and what the output was. Use diagrams (e.g., to explain parts of the pipeline, or to illustrate a linguistic structures) and annotated waveform and spectrogram plots to illustrate your report. It may not be appropriate to use a waveform or spectrogram to illustrate a front-end mistake. Avoid using verbatim output copied from the Terminal, unless this is essential to the point you are making.
Additional tips
Give the exact text you asked Festival to synthesise, so that the reader/marker can reproduce the mistakes you find in Festival (this includes punctuation!). Always explain why each of the mistakes you find belongs in a particular category. For example, differentiate carefully between
- part of speech prediction errors that cause the wrong entry in the dictionary to be retrieved
- errors in letter-to-sound rules
- waveform generation problems (e.g. an audible join)
Since the voice you are using in Festival is Scottish English, it is only fair to find errors for that variety of English, so take care with your spelling of specific input texts! You may find it helpful to listen to some actual speech from the source: Prof Alan Black. Quite conveniently for us, you can in fact listen to Alan Black talk about TTS to study his voice and the subject matter at the same time!
- Tips on writingThese apply to both the lab report. There are tips about both content and writing style.
1. Content of the lab report
These tips are provided as a set of slides. They concentrate on content. There is no video for this topic.
2. Scientific Writing – Style and presentation
There are slides for this video.
3. Scientific Writing – Saying more, in less space, and with fewer words
There are slides for this video.
4. Scientific Writing – figures, graphs, tables, diagrams, …
There are slides for this video.
5. Avoiding ambiguity
These tips are provided as a set of notes. They concentrate on the word “this”. There is no video for this topic.
Related forum
-
- Topic
- Voices
- Last Post
- You must be logged in to create new topics.
-
- Formatting instructionsSpecification of word limits and other rules that you must follow, plus the structured marking scheme.
You must:
- submit a single document in PDF format. When submitting to Learn, the electronic submission title must be in the format “exam number_lab report wordcount” and nothing else (e.g., “B012345_1857+438”)
- state your exam number at the top of every page of the document
- state the word count below the title on the first page in the document (e.g., “word count: 1857”)
- use a line spacing of 1.5 and a minimum font size of 11pt (and that applies to all text, including within figures, tables, and captions)
Marking is strictly anonymous. Do not include your name or your student number – only provide your exam number!
Structure
- Word limit: 2000 words, excluding bibliography but including all other text (such as headings, footnotes, text within figures & tables, captions, examples). Numerical data and phonetic transcriptions do not count as text.
- The +/- 10% rule applies to this word limit, but if you go over 2000 words, try to really think about whether the extra words are really enhancing your report (otherwise your markers may not thank you!)
- Page limit: no limit, large margins are fine but avoid blank pages
- Figures & tables: no limit on number
Structure of the lab report
You must use the following structure and section numbering for the lab report:
- 1 Introduction
- 2 Background
- 3 Finding and explaining mistakes
- 3.1 Text normalisation
- 3.2 POS tagging
- 3.3 Phrase break prediction
- 3.4 Pronunciation
- 3.5 Waveform generation
- 3.6 Other types of mistakes
- 4 Discussion and conclusion
You may add further subsections below any of the above, if you wish (e.g., to divide your Background section into subsections of your own choosing).
Figure, graphs and tables
You should ensure that figures and graphs are large enough to read easily and are of high-quality (with a very strong preference for vector graphics, and failing that high-resolution images). You are strongly advised to draw your own figures which will generally attract a higher mark than a figure quoted from another source. Tables must have column or row headers, as appropriate.
There is no page limit, so therefore there is no reason to have very small figures.
Your work may be marked electronically or we may print hardcopies on A4 paper, so it must be legible in both formats. In particular, do not assume the markers can “zoom in” to make the figures larger.
And finally,…
Marking scheme
You are strongly advised to read the marking scheme because it will help you focus your effort and decide how much to write in each section of your report.
Here is the structured marking sheet for this assignment (this is the version for 2022-23)
- Lab report
Forums for this assignment
(Only accessible if you log in)
Private
- You do not have permission to view this forum.