Multi-phone units

This topic has 1 reply, 2 voices, and was last updated 9 years, 1 month ago by Simon King.

Viewing 1 reply thread

Author

Posts
- January 29, 2017 at 18:42 #6629
  Xiao Z
  Student
  Just wondering how those multi-phone units are stored actually.
  e.g. the join cost between “ax k” and “ae t s” is unacceptable, and I guess we can take the “t s” part in the “ae t s” since “t s” is 0 join cost. I mean, we can extract part of multi-phone units, right? Or we can only discard multi-phone units if its join cost is bad?
- January 30, 2017 at 10:48 #6634
  Simon King
  Professor
  All units are stored in the same way: they are part of whole utterances, as recorded by the speaker, in the database. Candidate units are extracted from the database utterances on-the-fly during synthesis.
  
  You are correct that we can “extract” sub-parts of multi-phone units, yes. This is in fact no different to extracting any other units — whether single diphones or sequences of diphones — from the database utterances.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.