- This topic has 3 replies, 2 voices, and was last updated 1 year, 4 months ago by .
Viewing 3 reply threads
Viewing 3 reply threads
- You must be logged in to reply to this topic.
› Forums › Speech Synthesis › Festival › Phone set used for alignment vs synthesis
Hi,
There are two sets of phones used for transcriptions: set “A” that is not marked for stress, and set “B” that is. Set A has more “versions” of the different phones (e.g. oi, oo, or, ou, ow, owr) than set B (e.g. ow and oy). For example, “goat” is transcribed to “g ou t” in set A, but “g ow t” in set B.
If we get a transcription lookup from festival it uses set B. However, the alignment seems to be done with set A (those in the .mlf file).
Which phone set is used for speech generation? I would guess set B, since that is what is used for transcriptions in Festival. Wouldn’t that clash with the phones used at the alignment step?
Thank you
Yes, you need to use the same front end (ie phone set, lexicon, g2p model etc) for alignment and voice building as you will use at run time for the resulting synthetic voice.
It sounds like you may have built the initial mlf creation step using a different festival voice? (Maybe the default one that’s loaded when you start festival?)
I created the .mlf with the General American phone list (phone set “A” in my first post). How can I make sure it’s using that same phone set when it creates the .phones file?
I created that phone file with the same phone set (gam) with the command
festival –script \$FESTVOXDIR/src/promptselect/text2utts.scm
-eval festival_with_gam.scm
-level Segment
-itype data
-o stories_utts.phones stories_utts.data
I still get different phones in the .mlf file and the .phones file.
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2024 · Balance Child Theme on Genesis Framework · WordPress · Log in