› Forums › Speech Synthesis › Festival › Syllable structure & stress
- This topic has 4 replies, 5 voices, and was last updated 6 years, 10 months ago by Simon.
-
AuthorPosts
-
-
October 23, 2015 at 17:40 #399
I am aware that Festival uses 0, 1 and 2 to refer to lexical stress (unstressed, primary stress and secondary stress). However, when examining the lexical stress for the word “upset” (in my sentence, used as a noun), Festival appends a 3 on the second syllable. What does this mean?
-
October 23, 2015 at 20:26 #400
It’s tertiary stress, which is marked up in the Unisyn lexicon – see Section 3.4.3 of the Unisyn manual. Tertiary stress is essentially there not to show that a syllable might receive a pitch accent, but to block some post lexical rules, such as vowel reduction.
So, the second syllable in “upset” should never be reduced, in any context. I think Unisyn would regard “upset” as a compound word “up + set”, which is why the tertiary stress is marked up.
-
October 3, 2016 at 10:47 #5061
I was wondering about the syllable structure of the example word in the Week 3 lab – ‘caterpillar’. Is this done automatically or by hand? I am confused as to why the ‘p’ is part of the second syllable – normally I think it would be segmented as being the onset of the third syllable. Are there any situations which could arise where an incorrect or strange syllabification could cause problems in the synthesising of a word or sentence?
-
October 3, 2016 at 17:57 #5067
In Festival, you can detect when a pronunciation has come from the dictionary: it will have a correct Part Of Speech (POS) tag. Pronunciations predicted by the Letter To Sound (LTS) module have a ‘nil’ part of speech tag.
The example of caterpillar here returns a nil POS tag.
The syllabification of words whose pronunciation come from LTS, must also be made automatically, and therefore can contain errors.
An incorrect syllabification could indeed have consequences for speech synthesis later in the pipeline. It might affect the prediction of prosody. In unit selection, it might affect the units chosen from the database.
-
October 22, 2016 at 12:51 #5539
I know where to find explanations for most of the different steps from text to speech but I can’t seem to find a source on how Festival comes up with syllable structure and stresses for words that are not in the dictionary.
-
October 23, 2016 at 18:41 #5563
For the voice used in this assignment, this is done by rules hardwired into the low-level C++ code, which are specific to the Unilex dictionary.
(You are not expected to be able to read or understand the code, but feel free to try).
EDIT – see below for a more detailed answer explaining what the rules do.
-
October 5, 2017 at 16:41 #7859
How do Festival and the lexicon ‘know’ where is the stressed syllable in a made-up word such as foreign names –maybe by means of an accent grammar?
-
October 5, 2017 at 21:17 #7863
Syllabification of out-of-dictionary words is rule-based, using sonority. Every vowel is assumed to be the nucleus of a syllable. The boundaries between syllables are placed at positions of minimum sonority.
This requires knowing the sonority of every phoneme in the set used by the current lexicon. In Festival, sonority is calculated from the broad phonetic class.
A good reference for sonority would be this classic textbook
Giegerich, H. J. (1992) “English Phonology: an Introduction” Cambridge University Press, Cambridge, UK.
(Heinz Giegerich is the Professor of English Linguistics at Edinburgh University)
-
October 5, 2017 at 21:26 #7864
If you’re interested in how sonority is calculated from broad phonetic class in Festival, this is hard-coded as follows:
if (p->val(f_vc) == "+") // vowel-or-consonant == vowel return 5; else if (p->val(f_ctype) == "l") // consonant-type == liquid return 4; else if (p->val(f_ctype) == "n") // consonant-type == nasal return 3; else if (p->val(f_cvox) == "+") // consonant-voicing == voiced return 2; else return 1;
and the phoneme set used by the lexicon will have those features specified in a table (manually created by a phonetician) looking something like this example (which happens to be for Spanish):
(# - 0 - - - 0 0 -) (a + l 3 1 - 0 0 -) (e + l 2 1 - 0 0 -) (i + l 1 1 - 0 0 -) (o + l 3 3 - 0 0 -) (u + l 1 3 + 0 0 -) (b - 0 - - + s l +) (ch - 0 - - + a a -) (d - 0 - - + s a +) (f - 0 - - + f b -) (g - 0 - - + s p +) (j - 0 - - + l a +) (k - 0 - - + s p -) (l - 0 - - + l d +) (ll - 0 - - + l d +) (m - 0 - - + n l +) (n - 0 - - + n d +) (ny - 0 - - + n v +) (p - 0 - - + s l -) (r - 0 - - + l p +) (rr - 0 - - + l p +) (s - 0 - - + f a +) (t - 0 - - + s t +) (th - 0 - - + f d +) (x - 0 - - + a a -)
-
-
AuthorPosts
- You must be logged in to reply to this topic.