Forum Replies Created
-
AuthorPosts
-
I want to test the effect of using different numbers of mixture components during forced alignment labelling. My hypothesis was, because better labelling leads to better join points, listeners would hear fewer (or less noticeable) joins and rate the speech as more natural sounding.
I ran a listener test, and found no significant impact between voices labeled with different numbers of mixture components during forced alignment. I was trying to figure out why, and I found that the unit sequences were different between the two sequences.
This means I might not really have been measuring the true effect of better labelling. Listeners might not be indifferent to join point quality, but actually have been responding to the choice of units. I also found that more units were marked with ‘bad duration’ in voices with more mixture components in forced alignment. Labelling more units as outliers and imposing a high target cost might have meant potentially good candidate units were never used because of a ‘bad duration’ cost that was too high.
I wanted to force all the voices to use the same unit sequence so I could evaluate whether joins had generally been made better. If I could prove this, it would indicate there are some problems with the unit selection engine as is. That is, better labelling can make unit selection worse.
-
AuthorPosts