› Forums › Speech Synthesis › Festival › diphone/word/sentence position
- This topic has 3 replies, 2 voices, and was last updated 10 years ago by
Simon King.
-
AuthorPosts
-
-
February 24, 2016 at 20:34 #2635
Referring to Lecture 2, slide 62-64:
The target cost component chart includes ‘word position’ (position of diphone in word), but it is not clear where in the utt file this is stored. Can you point this out?
This same chart does not mention sentence position. Festival seems to choose appropriate diphones for end-of-sentences. Is this done by big break prediction, and is this not somehow considered a target cost? What about beginning or middle of sentence?
-
February 25, 2016 at 13:04 #2639
There is no need to store word position: it can be deduced easily on the fly by querying the utterance structure (words have a syllable as parent, which in turn has a word as parent).
Multisyn does actually use “phrase position” in the target cost (this was omitted in the lecture slides – apologies). Here are the actual costs currently used in Multisyn:
(10 tc_stress ) (5 tc_syl_pos ) (5 tc_word_pos) (6 tc_partofspeech) (7 tc_phrase_pos) (4 tc_left_context) (3 tc_right_context) (25 tc_bad_f0) (10 tc_bad_duration)
where “tc_phrase_pos” looks as match/mismatch in the phrase break feature of the word that the current segment belongs to.
-
February 25, 2016 at 14:36 #2643
So…to you last point, is Festival NOT considering phrase position as a target cost feature? You mention phrase position several times in the lecture slides as a potential linguistic specification, but it doesn’t shown up on the chart at slide 62 of ‘Festival’s Target cost components’.
-
February 25, 2016 at 14:54 #2648
I’ve updated my previous response….
-
-
AuthorPosts
- You must be logged in to reply to this topic.
This is the new version. Still under construction.