- This topic has 1 reply, 2 voices, and was last updated 8 years, 11 months ago by .
Viewing 1 reply thread
Viewing 1 reply thread
- You must be logged in to reply to this topic.
› Forums › Speech Synthesis › Unit selection › Use of paralinguistic information
I’m having a difficult time wrapping my head around how we could make use of extra markup on input text during synthesis. Would it be a matter of annotating all the source speech in a unit-selection database with similar information relating to, e.g., prosody, and just including that information in the selection of units at synthesis time? It seems like there must be cleverer ways to do that, but I can’t think of any.
As you say, we can place any labels we like on the database (whether automatically or manually), and then include appropriate sub-costs in the target cost to prefer units with matching values for these new linguistic features.
The hardest part is usually predicting these from the text input, at synthesis time. But, if we allow markup on that text, this information could be supplied by the user, or whatever system is generating the text.
It’s important to note that every new sub-cost added to the target cost effectively increases the sparsity of the linguistic feature space. We may need to record a (much) larger database. We would also have to carefully tune the weight on the new sub-cost to make sure that choosing candidate units that match the new feature doesn’t result in choosing candidates that are worse matches in the other (possibly more important) features.
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2024 · Balance Child Theme on Genesis Framework · WordPress · Log in