Database Redundancy: Used to our advantage?

This topic has 1 reply, 2 voices, and was last updated 9 years, 4 months ago by Simon.

Viewing 1 reply thread

Author

Posts
- January 23, 2016 at 13:12 #2226
  Joseph M
  Student
  One of the most annoying things for me about using a commercial dialogue system like Siri or Echo, is that if you ask the same question, you don’t just get the same answer, you get the identical waveform. Human speech of course does not work this way – in fact, we are not capable of producing 2 identical waveforms. Both Siri and Echo use minor tricks to lessen this effect – they will on occasion slightly vary their response if you ask the identical question in succession. But the same problem still persists – even with this variation in word response, the ‘waveform-identicalness’ issue is the same, just slightly less noticeable (but still very noticeable, just ask the same question 4 times in a row). The problem would seem to arise from the fact that the identical text in any TTS system (at least those we’ve studied so far, and apparently those in commercial use) will map to the identical linguistic specification, which will then map to the identical string of diphones, and that’s that: identical waveform. Why not leverage the naturally occurring redundancy in our database, or even actively seek to increase the redundancy, and then add the ability at run time to pick some kind of N-best list of possible waveforms, and then either randomly pick from the list, or cycle through it, based on some measure of recency of the exact same text input?
- January 24, 2016 at 16:54 #2318
  Simon
  Professor
  It’s certainly the case in unit selection that there are many versions that will sound as good as the one chosen via the target and join costs. Actually, there will very probably be many that sound better, but were not the lowest cost sequence in the search (why is that?).
  
  It’s easy in principal to generate an n-best list during a Viterbi search (although not implemented in Festival).
  
  Here’s an idea for how you might generate variants from your own unit selection voice without modifying any code:
  1. Synthesis the sentence, and examine the utterance structure to see which prompts from the database were used
  2. Remove one or more (maybe all) of those prompts from utts.data
  3. Restart Festival
  4. Synthesise the sentence again: different units will be chosen
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

Database Redundancy: Used to our advantage?

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis