- This topic has 1 reply, 2 voices, and was last updated 7 years, 6 months ago by .
Viewing 1 reply thread
Viewing 1 reply thread
- You must be logged in to reply to this topic.
› Forums › Speech Synthesis › Unit selection › Multi-phone units
Just wondering how those multi-phone units are stored actually.
e.g. the join cost between “ax k” and “ae t s” is unacceptable, and I guess we can take the “t s” part in the “ae t s” since “t s” is 0 join cost. I mean, we can extract part of multi-phone units, right? Or we can only discard multi-phone units if its join cost is bad?
All units are stored in the same way: they are part of whole utterances, as recorded by the speaker, in the database. Candidate units are extracted from the database utterances on-the-fly during synthesis.
You are correct that we can “extract” sub-parts of multi-phone units, yes. This is in fact no different to extracting any other units — whether single diphones or sequences of diphones — from the database utterances.
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2024 · Balance Child Theme on Genesis Framework · WordPress · Log in