diphones

This topic has 1 reply, 2 voices, and was last updated 9 years, 4 months ago by Simon King.

Viewing 1 reply thread

Author

Posts
- October 24, 2016 at 20:52 #5585
  Ruiduan L
  Student
  What’s the exact significance of predicting acoustic properties of diphones (as diphones in database have already have F0 and duration values)?
- October 25, 2016 at 09:34 #5587
  Simon King
  Professor
  In diphone synthesis, there is just one recorded copy of each diphone. The F0 and duration of that recorded copy will be arbitrary. If we simply concatenated these recordings, we would get an arbitrary and very probably discontinuous F0 contour. We must manipulate the recording in order to impose the predicted F0 (e.g., to get gradual declination over a phrase), and to impose predicted duration.
  
  In unit selection synthesis, we have many recordings of each diphone to choose from. In some versions of unit selection (covered in detail in the Speech Synthesis course), we will use the front end’s predictions of F0 and duration to help us choose the most appropriate one.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.