› Forums › Speech Synthesis › Festival › Phone set used for alignment vs synthesis
- This topic has 3 replies, 2 voices, and was last updated 3 years ago by
Alexandra S.
-
AuthorPosts
-
-
March 30, 2023 at 14:43 #16799
Hi,
There are two sets of phones used for transcriptions: set “A” that is not marked for stress, and set “B” that is. Set A has more “versions” of the different phones (e.g. oi, oo, or, ou, ow, owr) than set B (e.g. ow and oy). For example, “goat” is transcribed to “g ou t” in set A, but “g ow t” in set B.
If we get a transcription lookup from festival it uses set B. However, the alignment seems to be done with set A (those in the .mlf file).
Which phone set is used for speech generation? I would guess set B, since that is what is used for transcriptions in Festival. Wouldn’t that clash with the phones used at the alignment step?
Thank you
-
March 30, 2023 at 18:34 #16802
Yes, you need to use the same front end (ie phone set, lexicon, g2p model etc) for alignment and voice building as you will use at run time for the resulting synthetic voice.
It sounds like you may have built the initial mlf creation step using a different festival voice? (Maybe the default one that’s loaded when you start festival?)
-
March 30, 2023 at 19:22 #16803
I created the .mlf with the General American phone list (phone set “A” in my first post). How can I make sure it’s using that same phone set when it creates the .phones file?
-
March 31, 2023 at 11:06 #16804
I created that phone file with the same phone set (gam) with the command
festival –script \$FESTVOXDIR/src/promptselect/text2utts.scm
-eval festival_with_gam.scm
-level Segment
-itype data
-o stories_utts.phones stories_utts.dataI still get different phones in the .mlf file and the .phones file.
-
-
AuthorPosts
- You must be logged in to reply to this topic.
This is the new version. Still under construction.