› Forums › Foundations of speech › Phonetics and speech science › Canned speech vs TTS in terms of Asian languages
- This topic has 1 reply, 2 voices, and was last updated 9 years, 6 months ago by
Simon King.
-
AuthorPosts
-
-
October 5, 2016 at 20:51 #5201
As some eastern asian languages are based on fixed syllables, like 44 basic sounds in Japanese, and also Chinese Korean Thai etc. All the speech we make just includes these fixed-number sounds.
In this circumstances, the drawback of canned speech, which is that it can only say fixed number of things, will not happen to these languages. Does it mean canned speech is far better than TTS?
E.g. Chinese is based on characters(every character includes a consonant and a vowel or diphone), and the combination is very limited. What’s more, the tone is fixed, so the change of intonation can be ignored. Moreover, one trait of Chinese is that every syllable should be pronounced clearly and relatively slow.
-
October 6, 2016 at 07:56 #5205
It’s a common misconception that a language with fixed inventory of phonological units (whether Consonant-Vowel units, syllables, or whatever) can be perfectly synthesised from an inventory containing a single recording of each such unit.
All languages have a fixed inventory of phonemes (it’s not possible to invent new ones!) and also of syllables (due to phonotactic constraints), or whatever the equivalent is in that language (e.g., the mora in Japanese).
The key point is that the acoustic realisation of each unit is influenced by its context, and so having multiple recordings of each (from many different contexts) will give better results.
Tone languages still have intonation. Tone by itself does not entirely determine F0. Tone is typically realised as the shapes of an F0 contour, not absolute values. In tone languages, F0 is carrying both segmental and supra-segmental information.
-
-
AuthorPosts
- You must be logged in to reply to this topic.
This is the new version. Still under construction.