The various CARTs used in Festival

This topic has 1 reply, 2 voices, and was last updated 9 years, 4 months ago by Simon.

Viewing 1 reply thread

Author

Posts
- October 13, 2015 at 18:36 #293
  Eoin M
  Student
  Can details about the questions/features used for the CART algorithm at various stages of the Festival pipeline be accessed easily through Festival?
  
  On what training data were these decision trees trained?
- October 14, 2015 at 10:56 #306
  Simon
  Professor
  CARTs are used in several places within Festival. The best example is the letter-to-sound model. Look at the file lib/dicts/cmu/cmu_lts_rules.scm in http://www.cstr.ed.ac.uk/downloads/festival/2.4/festlex_CMU.tar.gz which is a letter-to-sound classification tree trained on the CMU lexicon.
  
  Here’s the start of the tree for the letter “a” from that file:
```
(set! cmu_lts_rules '(
(a
 ((n.name is r)
  ((p.name is e)
   ((n.n.name is t)
    ((p.p.name is h)
     (((aa0 0.030303) (aa1 0.969697) aa1))
....etc
```
  n.name refers to the predictor “name of the next letter” and the line
  
  (((aa0 0.030303) (aa1 0.969697) aa1))
  
  is a leaf, showing the distribution of values for the predictee.
  
  The letter-to-sound CART is trained on the pronunciation dictionary (which was written by hand). Others are trained on hand-labelled data of other types (e.g., speech with hand-annotated phrase breaks).
  
  CARTs can also be written by hand. One reason for doing this is when no training data are available. Here’s an example of a CART for predicting phrase breaks from punctuation.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.