The front end

We need to process the input text, first to identify the words, then to decide how they should be said.
  • Tokenisation and normalisation

    The first task is to convert all of the text into words.

  • Letter to sound

    Once the text is entirely converted to words, we need to decide on their pronunciations.

  • CART

    Classification and regression trees are widely-applicable models for making predictions. We can use them for letter-to-sound, prosody, and many other tasks.

  • Prosody prediction

    This is typically predicted in several stages: placement of events, classification of their types, then realisation.