- This topic has 3 replies, 2 voices, and was last updated 8 years, 10 months ago by .
Viewing 3 reply threads
Viewing 3 reply threads
- You must be logged in to reply to this topic.
› Forums › Speech Synthesis › Festival › diphone/word/sentence position
Referring to Lecture 2, slide 62-64:
The target cost component chart includes ‘word position’ (position of diphone in word), but it is not clear where in the utt file this is stored. Can you point this out?
This same chart does not mention sentence position. Festival seems to choose appropriate diphones for end-of-sentences. Is this done by big break prediction, and is this not somehow considered a target cost? What about beginning or middle of sentence?
There is no need to store word position: it can be deduced easily on the fly by querying the utterance structure (words have a syllable as parent, which in turn has a word as parent).
Multisyn does actually use “phrase position” in the target cost (this was omitted in the lecture slides – apologies). Here are the actual costs currently used in Multisyn:
(10 tc_stress ) (5 tc_syl_pos ) (5 tc_word_pos) (6 tc_partofspeech) (7 tc_phrase_pos) (4 tc_left_context) (3 tc_right_context) (25 tc_bad_f0) (10 tc_bad_duration)
where “tc_phrase_pos” looks as match/mismatch in the phrase break feature of the word that the current segment belongs to.
So…to you last point, is Festival NOT considering phrase position as a target cost feature? You mention phrase position several times in the lecture slides as a potential linguistic specification, but it doesn’t shown up on the chart at slide 62 of ‘Festival’s Target cost components’.
I’ve updated my previous response….
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2024 · Balance Child Theme on Genesis Framework · WordPress · Log in