Synthesising directly from a phone sequence rather than text

This topic has 2 replies, 2 voices, and was last updated 9 years, 9 months ago by Simon King.

Viewing 2 reply threads

Author

Posts
- May 31, 2016 at 20:37 #3208
  Etienne D
  Student
  Hi,
  
  I’m attempting to perform synthesis from a sequence of phones. I have tried to use the SayPhones method but I could not find enough documentation: the following (and other phone sequences as well), taken from a random paper on the web, only yields an EST error “item is null so has no stress feature”.
  
  festival> (SayPhones ‘(# n o t w @@r k i ng #))
  
  I then stored a sequence of phones into a variable like so (using Utterance Phones):
  
  festival> (set! someutt (Utterance Phones (# h @ l ou #)))
  
  and tried to construct the utterance step by step in a similar fashion to what we had done in the 1st Speech processing assignment, but failed again.
  
  Could you provide some explanation as to how to go about synthesising from a phone sequence, and optionally, how to fine-tune parameters such as length and stress?
  Thank you,
  Étienne
  
  edit: fixed the link
- June 5, 2016 at 10:41 #3224
  Simon King
  Professor
  I’m not sure of the solution to this. Let’s talk in person – is Festival the best framework for you, or should we consider a DNN system?
- June 6, 2016 at 14:44 #3230
  Simon King
  Professor
  SayPhones is probably only going to work for a diphone voice, not a Multisyn unit selection voice. Try loading a diphone voice and see if that works. You are going to get monotonic F0 though, I think.
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.