› Forums › Readings › Taylor – Text-to-speech synthesis › Taylor – Chapter 3
- This topic has 3 replies, 3 voices, and was last updated 5 years, 2 months ago by Hanna P.
-
AuthorPosts
-
-
October 4, 2016 at 13:43 #5101
The text-to-speech problem
-
October 2, 2018 at 13:48 #9384
On page 15 of this reading (page 40 of the book), Taylor seems to distinguish between ‘suprasegmental “prosody”‘, ‘affective prosody’ and ‘augmentative prosody’ when describing the complete prosody generation model. I know this may be a little out-of-context to this course, but what exactly is the difference between these three? Previously, I had assumed that all prosodic features were simply suprasegmental.
-
October 3, 2018 at 15:44 #9393
When Taylor says “suprasegmental prosody” (which he elaborates later on, in Section 6.5.2 of his book) he means aspects of prosody closely associated with the words and the literal meaning of the utterance. For example: syllable stress within a word, or placing a prominence on a content word, or tone in a tonal language.
He uses “affective prosody” (Section 6.5.1) to mean aspects of prosody that convey emotion, attitude and other things determined by the mental state of the talker.
Under “augmentative prosody” (Section 6.5.3) he includes the use of prosody to aid communication, such as using rising intonation at the end of a yes/no question: even though this is not essential, it significantly aids communication efficiency. Another example would be placing phrase breaks to help the listener disambiguate information.
[This material is beyond the scope of the Speech Processing course, but would be in-scope for Speech Synthesis]
-
October 10, 2019 at 08:08 #9963
Just wanted to comment on this passage I got caught up in:
“One of the most notable features of writing is that it nearly exclusively encodes the verbal component of the message alone: the prosody is ignored. This feature seems to be true of all the writing systems known around the world. Because of this it is sometimes said that written language is impoverished with respect to spoken language insofar as it can express only part of the message.“ p. 30
Why don’t metre, stress, rhythm, pauses, falls, rises, quality of phonemes etc in literature not count? Jurafsky even seems to oppose poetic patterns to rhythm in speech (p. 262)! I would have thought the complete opposite of what Taylor’s saying — the significant thing that world literature has in common here is the importance of metrical poetry, which can emerge many centuries before prose in anything beyond record keeping, and continues to be part of oral tradition. Yet he’s more willing to classify happy-face emoticons as ‘an attempt to encode affective prosody’ than rhythmic writing! Just imagine ’to be or not to be’ rewritten to the metre of humpty dumpty — it would have an even greater change of tone than repeating it while laughing out loud.
A synthesiser reading poetry, or even literary prose or political speeches, wouldn’t sound quite right, right? But poetic metre shows that a human reader wouldn’t simply be improvising the way they would them them. Maybe a text processing step to identify textual prosody and repetitions would be able to lead onto more melodious output.
-
-
AuthorPosts
- You must be logged in to reply to this topic.