- This topic has 1 reply, 2 voices, and was last updated 9 years, 1 month ago by .
Viewing 1 reply thread
Viewing 1 reply thread
- You must be logged in to reply to this topic.
› Forums › Speech Synthesis › Unit selection › Multi-phone units
Just wondering how those multi-phone units are stored actually.
e.g. the join cost between “ax k” and “ae t s” is unacceptable, and I guess we can take the “t s” part in the “ae t s” since “t s” is 0 join cost. I mean, we can extract part of multi-phone units, right? Or we can only discard multi-phone units if its join cost is bad?
All units are stored in the same way: they are part of whole utterances, as recorded by the speaker, in the database. Candidate units are extracted from the database utterances on-the-fly during synthesis.
You are correct that we can “extract” sub-parts of multi-phone units, yes. This is in fact no different to extracting any other units — whether single diphones or sequences of diphones — from the database utterances.
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
This is the new version. Still under construction.Copyright © 2026 · Balance Child Theme on Genesis Framework · WordPress · Log in