- This topic has 1 reply, 2 voices, and was last updated 7 years, 9 months ago by .
Viewing 1 reply thread
Viewing 1 reply thread
- You must be logged in to reply to this topic.
› Forums › Speech Synthesis › Unit selection › diphones
What’s the exact significance of predicting acoustic properties of diphones (as diphones in database have already have F0 and duration values)?
In diphone synthesis, there is just one recorded copy of each diphone. The F0 and duration of that recorded copy will be arbitrary. If we simply concatenated these recordings, we would get an arbitrary and very probably discontinuous F0 contour. We must manipulate the recording in order to impose the predicted F0 (e.g., to get gradual declination over a phrase), and to impose predicted duration.
In unit selection synthesis, we have many recordings of each diphone to choose from. In some versions of unit selection (covered in detail in the Speech Synthesis course), we will use the front end’s predictions of F0 and duration to help us choose the most appropriate one.
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2024 · Balance Child Theme on Genesis Framework · WordPress · Log in