Module 4 – speech synthesis – pronunciation & prosody

Pronunciation, including letter-to-sound models, and predicting prosody. All these tasks can be done with Classification And Regression Trees (CARTs).
Log in

In this module we meet our first machine learning approach. Classification and regression trees (CARTs) are widely-applicable models for making predictions. We can use them for letter-to-sound, prosody, and many other tasks.

Start with these two blog posts. The first one will give you the general idea of a decision tree. Most of the video is about building the tree (this phase is called “training” in machine learning). Training is done just once, when we build a system. We can then make predictions for new samples using this tree (this phase is sometimes called “testing” in machine learning). The second blog post will help you understand entropy; this is used when training a CART, and is an important concept that you need to understand.

Now work through the videos in this module. For the worked example, you can download and print the training data set, then cut the rows of the table into strips: each row is a data point. The last column is the predictee (i.e., the sound) and the first 7 columns are the predictors (i.e., the letters). Try to work through the example yourself in addition to watching the videos.

Download the slides for the module 3, 4 and 5 videos

Download the slides for the worked letter-to-sound example in the videos

Download the additional slides for the class on 2019-10-17 : Module 4 (version 2, updated 2019-10-13)

Post-lecture materials

Download the complete slides used in the class on 2019-10-17 : Module 4

Total video to watch in this module: 51 minutes

Once the text is entirely converted to words, we need to decide on their pronunciations.
Log in if you want to mark this as completed
We need a set of labelled training data, because this is supervised machine learning.
Log in if you want to mark this as completed
We start by placing all the training data at the root node, and calculating its entropy.
Log in if you want to mark this as completed
Next, we try splitting the data at the root node using one of the available questions...
Log in if you want to mark this as completed
...and repeat for all other available questions. The best one is placed in the tree. Then we recurse.
Log in if you want to mark this as completed
This is typically predicted in several stages: placement of events, classification of their types, then realisation.
Log in if you want to mark this as completed
A quick recap
Log in if you want to mark this as completed

Another example to work through: phrase break prediction

We will be doing this as an exercise in class, in the Module 4 main lecture. Feel free to try it on your own in advance, or you can wait to do it in-class.

Now that you know how to train and then use a CART for letter-to-sound prediction, you can build your own CART for another task. Let’s try to do that for phrase break prediction: this is a classification task because we are predicting a discrete value (phrase break or no phrase break). If you can do this exercise, you should be able to see that a CART can be applied to any classification or regression task.

The raw training data

The method requires training data, manually annotated with phrase breaks. We must decide what features to extract from the text, to use as predictors for the CART. Then, we follow a training algorithm to learn the tree from the training data. After that, we can label the locations of phrase breaks on previously-unseen test data.

Download the corpus of training sentences and think about what features you can extract from them. Hint: you only need to consider features that the corpus is already annotated with. Choose up to three features that you will use as predictors in your CART. The predictee is the presence or absence of a phrase break.

The provided data is POS tagged with a very simple set of tags – don’t worry if you don’t agree with the tag set or some of the words’ tags – just use the data as provided.

Prepare the training data by extracting features

Use this sheet to prepare your training data. Label each data point (i.e., each word) with your chosen predictors. I’ve already filled in the predictee for you – breaks are associated with the word that they occur after. This means that end-of-sentence punctuation is not labelled with a phrase break.

Candidate questions for the CART

Look across your predictors and write down all the possible questions you might ask about them. We’ll keep it simple and just ask “Does Predictor X have value A” type questions, and not try to form category questions such as “Is the value of Predictor X in the set {A, B, C}”

Here’s a sheet where you can write down the questions.

Build (train) your CART – start with the root node

Place all your data at the root node and compute the entropy. This is a measure of how predictable the value of the predictee is at this point. I’ll start you off:

The value “BREAK” occurs 12 times so its probability is 12/34 which is about 0.35.

The value “NO BREAK” occurs 22 times so its probability is 22/34 which is about 0.65.

Now compute entropy using “- sum of p log p”. Make sure you know how to do this for yourself first. Then, to save time you could use this entropy calculator.

For every possible question in your list of questions, split the data into the two corresponding partitions. Compute the entropy of each partition. Then compute the total entropy, which is a weighted sum of those two values.

It will be tedious to try all questions manually, so you can take a short cut: use your intuition and just try those few questions that you think will work the best.

Pick the best question, place it into the tree. Permanently partition the training data using it.

Recurse

Now simply recurse. In other words, repeat exactly what you did at the root node, for each of the two children in turn, using whatever subset of the training data you find at that child node. Then continue growing the tree. You’ll need to set yourself a stopping criterion: for this toy data set, I suggest that you stop when all data points at a leaf have the same value for the predictee (i.e., zero entropy).

Make predictions (testing phase)

Now use your CART to predict where phrase breaks occur in the test data. You’ll first need to extract the same features (predictors) as for the training data, of course.

How well does your CART work?

After you have made your predictions of the test data labels, compare them to the correct labels for the test data and compute the accuracy. How often does your tree get it right?