Use of paralinguistic information

This topic has 1 reply, 2 voices, and was last updated 9 years, 4 months ago by Simon.

Viewing 1 reply thread

Author

Posts
- January 16, 2016 at 15:45 #2091
  Alex G
  Student
  I’m having a difficult time wrapping my head around how we could make use of extra markup on input text during synthesis. Would it be a matter of annotating all the source speech in a unit-selection database with similar information relating to, e.g., prosody, and just including that information in the selection of units at synthesis time? It seems like there must be cleverer ways to do that, but I can’t think of any.
- January 17, 2016 at 10:56 #2164
  Simon
  Professor
  As you say, we can place any labels we like on the database (whether automatically or manually), and then include appropriate sub-costs in the target cost to prefer units with matching values for these new linguistic features.
  
  The hardest part is usually predicting these from the text input, at synthesis time. But, if we allow markup on that text, this information could be supplied by the user, or whatever system is generating the text.
  
  It’s important to note that every new sub-cost added to the target cost effectively increases the sparsity of the linguistic feature space. We may need to record a (much) larger database. We would also have to carefully tune the weight on the new sub-cost to make sure that choosing candidate units that match the new feature doesn’t result in choosing candidates that are worse matches in the other (possibly more important) features.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.