› Forums › Speech Synthesis › The front end › CART › Worked example 2 – phrase breaks
- This topic has 1 reply, 2 voices, and was last updated 7 years, 9 months ago by Simon.
-
AuthorPosts
-
-
October 17, 2016 at 09:18 #5485
I’ve attempted to work through this example on my own. I have a slight problem with understanding the building of the tree. I label the predictors as their Parts-Of-Speech, then use the question Is the label after “BREAK” PUNC?, I immediately get 0 entropy on both sides of the root node, as everything which is punctuation comes after a BREAK, and everything which isn’t punctuation is a conjunction. Am I missing a step?
-
October 17, 2016 at 09:39 #5487
Let’s follow your working:
I label the predictors as their Parts-Of-Speech
Correct – but you should say that you annotate the training data samples with values for each of the three predictors. In this example, we are using the POS of the preceding, current and following word as the predictors 1, 2 and 3 respectively.
use the question Is the label after “BREAK” PUNC?
Let’s word that more carefully. Questions must be about predictors of the current data point. So you should say:
Ask the question Is predictor 3 = PUNC?
Now partition the data accordingly.
everything which is punctuation comes after a BREAK, and everything which isn’t punctuation is a conjunction
This is where you’ve made the mistake. For question Is predictor 3 = PUNC?, 8 data points have the answer “Yes” and all of them have the value “NO-BREAK” for the predictee, which indeed is a distribution with zero entropy. So far, so good.
Now look at the 26 data points for which the answer to Is predictor 3 = PUNC? was “No”. The distribution of predictee values is 4 BREAKs and 22 NO-BREAKs. That distribution does not have zero entropy.
Your reasoning about “everything which isn’t punctuation is a conjunction” is incorrect. You are looking at the distribution of values of a predictor. When measuring entropy, we look only at the value of the predictee. That is the thing we are trying to predict. Entropy is measuring how much more predictable it has become after a particular split of the data.
-
-
AuthorPosts
- You must be logged in to reply to this topic.