- This topic has 2 replies, 2 voices, and was last updated 6 years ago by .
Viewing 2 reply threads
Viewing 2 reply threads
- You must be logged in to reply to this topic.
› Forums › Speech Synthesis › Unit selection › Join cost
In the videos it mentions that the join cost is very local in that the join cost focuses solely on the final 20 ms of one frame (frame A) and the 20 ms of the beginning of the subsequent frame (frame B). Therefore, the join cost might not account for sudden changes in energy or F0 which occurs further in the frame.
The videos explain that we can resolve this problem by considering deltas. What does this mean?
Does this mean that we simply find the gradient of the F0, for example, on both frame A and B and simply subtract them (if gradients are the same, then they intuitively would create a better join than one with varying gradients)?
Is this on the right lines?
Thinking about it though, factoring deltas surely still means that the join cost is still very local. (i.e. only factoring the gradients on either sides of the join)?
Adding deltas effectively brings in information from neighbouring frames. You are right that this will still be over a fairly small region of the signal.
The join cost has to be local to the two candidate units being considered for a join. If it depended on the properties of other candidate units, then this would dramatically increase the complexity of the search problem.
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2025 · Balance Child Theme on Genesis Framework · WordPress · Log in