- This topic has 1 reply, 2 voices, and was last updated 4 years, 4 months ago by .
Viewing 1 reply thread
Viewing 1 reply thread
- You must be logged in to reply to this topic.
› Forums › Readings › Other readings › Trajectory Tiling – Qian et al
I was really interested in the cross-lingual voice transformation mentioned in this paper, but I am a bit confused. I thought this was about a hybrid method, but in Figure 3, the final step before producing synthesised speech is a vocoder. Are they just referring to the process of selecting and concatenating units as vocoding, or have I missed something here?
Yes, that diagram has many steps! p287 says
“After we generate the Mandarin training sentences for the monolingual English speaker, his HMM based TTS in Mandarin can be trained via the standard HMM training procedure.”
so what they are doing is using trajectory tiling (with the waveform being created using concatenation) to construct a training set in the target language, for a speaker who doesn’t speak that language.
That data is then used to train a conventional HMM-based system that drives a vocoder.
All the synthesisers compared in Fig 12 are conventional HMM-plus-vocoder systems. Trajectory tiling is used to create the training data for TSMT.
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2024 · Balance Child Theme on Genesis Framework · WordPress · Log in