- This topic has 1 reply, 2 voices, and was last updated 8 years, 2 months ago by .
Viewing 1 reply thread
Viewing 1 reply thread
- You must be logged in to reply to this topic.
› Forums › Speech Synthesis › The front end › letter to sound alignment – question 2
hello,
I managed to do the alignment by implementing the ideas mentioned in the previous question. However, to build a voice I need utterance structures which in this case do not seem feasible, as no linguistic information is used.
Is there anyway I can skip this step when building a DNN voice?
thanks,
Norbert
You don’t need utterance structures for the very simple case that you are trying at this point (treat letters as phonemes, and use no other linguistic information). To build a voice, you simply need to figure out how to create the input features for training the DNN. You need to use the Prepare the input labels steps of the DNN voice building exercise as your starting point, but replace some steps with your own scripts.
For example, you do not need the step “Convert utterance structures to full context labels” – you need to create these full context labels using your own script (I suggest starting with a “full context” of triphones or quinphones).
The “Convert label files to numerical values” will be essentially the same, but you’ll need to modify the questions so that they correctly query your labels.
It’s well worth doing all of this with your own scripts (they are quite simple) because this will give you a deeper understanding of all the steps involved. Then, you could switch to the Ossian framework, which will automate some of this for you.
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2024 · Balance Child Theme on Genesis Framework · WordPress · Log in