› Forums › Speech Synthesis › Unit selection › Automatic text selection
- This topic has 3 replies, 3 voices, and was last updated 8 years, 6 months ago by Simon.
-
AuthorPosts
-
-
January 22, 2016 at 18:55 #2209
What would be the most important criteria for choosing the best diphone coverage in a greedy algorithm? Is it something like a ratio of the number of different diphones to the length of sentence? Is 80% coverage good enough?
-
January 24, 2016 at 13:43 #2307
We’ll look at a more detailed example of greedy text selection in the lecture.
Your suggestion to normalise for the length of the sentence is a good idea, otherwise we might just select the longest sentences (because they contain more diphones than shorter sentences).
You make a good point about final total coverage: 100% might be impossible simply because there are no occurrences of certain very rare diphones in our large corpus. The ARCTIC corpus covers around 75-80% of all possible diphones. The initial large corpus contained at least one example of about 90% of all possible diphones (reducing to around 80% when discarding sentences that are not “nice”), so that would be a ceiling on the possible coverage that could ever be obtained.
-
January 24, 2016 at 14:51 #2310
Would it be possible to use Natural Language Generation to compose sentences to make up a corpus with 100% coverage? Or, if we knew what the ‘missing’ 20% was, could we generate sentences to fill in those missing diphones?
Failing all that, could we simply hand-write additional sentences to increase our coverage? -
January 24, 2016 at 16:45 #2317
Sure – that would be fine.
In general, I don’t think people use Natural Language Generation (NLG) for this, mainly because NLG systems are typically limited domain, and so will only generate a closed set of sentences (or at least, from a closed vocabulary).
The vast majority of missing diphones will be cross-word (why is that?). So, all you would really need to do is find word pairs that contain the required diphone. However, you would want these to occur in a reasonably natural sentence, so that they can be used in the same way as the other prompts (i.e., recorded and used in their entirety).
-
-
AuthorPosts
- You must be logged in to reply to this topic.