- This topic has 3 replies, 3 voices, and was last updated 8 years, 6 months ago by .
Viewing 3 reply threads
Viewing 3 reply threads
- You must be logged in to reply to this topic.
› Forums › Speech Synthesis › Unit selection › Expressive speech
Given that it is possible to produce expressive sounding speech, how is this applied in most systems? Do commercial systems attempt to automatically predict which expression is most appropriate or is expression essentially ignored for the most part?
Following on from this, are there any existing corpora tagged for expression? It would be possible to model expressivity as a classification problem, but this is unlikely to be as reliable as a hand-tagged corpus. (In any case, a hand-tagged corpus would be required as training data for a classifier.)
If we want a voice that speaks in a single speaking style (or emotion), then we can simply record data in that style and build the voice as usual. That will work very well, but will not scale to producing many different styles / emotions / etc.
Can you each try to specify more precisely what you mean by ‘expression’ before I continue my answer? Is it a property of whole utterances, or parts of utterances, for example?
I think I had whole utterances in mind, for example like the expression you might use when reading a story to a child.
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2024 · Balance Child Theme on Genesis Framework · WordPress · Log in