- This topic has 1 reply, 2 voices, and was last updated 8 years, 11 months ago by .
Viewing 1 reply thread
Viewing 1 reply thread
- You must be logged in to reply to this topic.
› Forums › Speech Synthesis › Unit selection › Labeling Prosody
I understand that currently Festival doesn’t label prosody, but implicitly reply on other information such as POS tags and so on. But if we are to label prosody, is there any method that is better than ToBI (that is, requires less prior knowledge)? And if we use ToBI, is it possible that we train a model with some speech data (mainly looking at F0 contour, stresses, and the text itself) hand-labeled with ToBI symbols, and then use this model to automatically label prosody for us?
Labelling prosody on the database is one of those topics that has a long history of research, but no really good solutions: it’s a very hard problem.
We are fairly sure that highly-accurate ToBI labels are helpful, provided that we have them for both the database utterances and test sentences. So, even if we hand-label the database, we still have the hard problem of accurately predicting from text at synthesis time, in a way that is consistent with the database labels.
Yes, many people have looked at simpler systems that ToBI. Festival reduces the number of boundary strength levels, for example. Your suggestion to train a model on a hand-labelled subset of data and use that to label the rest of the database is excellent: this is indeed what people do. But there remains the “predicting from text” problem at synthesis time.
One simpler approach is to think just about prominences and boundaries.
Perhaps a more promising approach these days is to label the database with more sophisticated linguistic information than plain old POS tags, such as shallow syntactic and semantic structure.
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2024 · Balance Child Theme on Genesis Framework · WordPress · Log in