- This topic has 1 reply, 2 voices, and was last updated 9 years, 3 months ago by .
Viewing 1 reply thread
Viewing 1 reply thread
- You must be logged in to reply to this topic.
› Forums › Speech Synthesis › The front end › CART › CART: hand-labelling the training data
The first part of the video on clasification and regression trees(CART) is about hand-labelling data. Could it be possible to have an example of how really a text is hand labelled? Is every word of a given text examinated by a human and labelled? And how many question per word does this process generally need?
Here are some examples of data that must be hand-labelled before we can apply machine learning (e.g., training a classification tree):
1. letter-to-sound
The hand-labelled data consists of words and their pronunciations, such as this (extracted from cmulex):
...
editing eh1 d ax t ih0 ng
edition ax d ih1 sh ax n
editions ih0 d ih1 sh ax n z
editor eh1 d ax t er0
editorial eh1 d ax t ao1 r iy0 ax l
...
which is in fact just the pronunciation dictionary that we will already have created by hand. The lexicon may also provide a syllabification of the phoneme string. It does not specify the alignment between letters and phonemes.
2. phrase-break prediction
We will hand-label the phrase breaks in a set of 100s or 1000s of recorded utterances. Where possible, we will use existing data that some kind person has already labelled, such as the Boston University Radio News corpus.
When you say “how many question per word does this process generally need” I think you are referring to how we choose the predictors for training a CART. This is done through expert knowledge, remembering that it’s OK to have a large set of predictors because the CART training procedure will only select the useful ones.
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2025 · Balance Child Theme on Genesis Framework · WordPress · Log in