- This topic has 1 reply, 2 voices, and was last updated 7 years, 4 months ago by .
Viewing 1 reply thread
Viewing 1 reply thread
- You must be logged in to reply to this topic.
› Forums › Speech Synthesis › Unit selection › decision tree in ASF
Hi Simon,
I know we can use cepstral space to measure and get the “perceptual space”. But can we just use the data from the perception listening test of a specific language?
Cuz I think speakers of different languages are sensitive to different things. (e.g. in one Chinese dialect, people cannot tell /n/ from /l/. In this case, we may put /n/ and /l/ together; while these two may have a big distance in cepstral space) Taylor in his book mentioned we don’t define an abstract perceptual space, but if we have enough data, can we do that?
Your proposal is to use perceptual data (i.e., from listening tests with human subjects) to define a target cost function. It’s a good idea, and has been tried, but it’s difficult to get enough perceptual data to automatically learn such a function.
In the following paper, we describe a simple target cost function (in the form of a classifier) that is learned from perceptual data. It worked, but did not beat Festival’s standard IFF target cost function. Note that our novel target cost function is still using only linguistic features as input, and doesn’t use acoustic properties of the candidates.
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2024 · Balance Child Theme on Genesis Framework · WordPress · Log in