Hybrid model: where does the improvement come from?

This topic has 1 reply, 2 voices, and was last updated 8 years, 3 months ago by Simon.

Viewing 1 reply thread

Author

Posts
- February 21, 2017 at 13:49 #6783
  Jiewen Z
  Student
  In the hybrid approach to TTS, we use speech parameters (vocoder parameters) to select the candidate units. But I do not understand where the improvement comes from.
  
  I have two hypothesises:
  
  1. Speech parameters are better than linguistic features, leading to a better target cost function
  
  2. Vocoders are still not good enough to reconstruct very natural speech.
  
  I tend towards the first view. I tried the WORLD vocoder, and it reconstructed my voice perfectly (I cannot hear any differences between the original waveform and the reconstructed one).
- February 25, 2017 at 18:59 #6791
  Simon
  Professor
  Both your hypotheses are reasonable.
  
  Hypothesis 1 simply states that an ASF target cost function is superior to an IFF one. That will be true if our predictions of speech parameters are sufficiently accurate. The reason that measuring the target-to-candidate-unit distance in acoustic space is better than in linguistic feature space is sparsity. See Figure 16.6 in Taylor’s book, or the video on ASF target cost functions.
  
  Hypothesis 2 is currently true much of the time, although improvements are being made steadily. It is just now becoming possible to construct commercial-quality TTS systems that use a vocoder, rather than waveform concatenation.
  
  It’s worth reminding ourselves that an ASF target cost function does not need to use vocoder parameters as such, because we do not need to be able to reconstruct the waveform from them. We could choose to use a simpler parameterisation of speech (e.g., ASR-style MFCCs derived using a filterbank) if we wished.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

Hybrid model: where does the improvement come from?

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis