› Forums › Basic skills › Scientific writing › Clarification on ‘What in the speech signal differentiates different phones?’
- This topic has 15 replies, 2 voices, and was last updated 2 years ago by Simon.
-
AuthorPosts
-
-
November 6, 2022 at 18:14 #16396
Hi there,
I’m struggling to understand what is meant by ‘What in the speech signal differentiates different phones?’. Could i possibly get a hint or be pointed in the right direction for where I could find material that would help me answer this question?
Thanks
-
November 6, 2022 at 18:43 #16398
The answer is in the phonetics material of Speech Processing – go back over Module 1 to recap how speech is produced, then Module 2 which covers the acoustic properties of vowels and consonants. You might also find Module 4 helps you answer this question, especially the last video, “Phoneme”.
-
November 6, 2022 at 19:22 #16399
Thank you, I also wanted to double check, do words in our diagrams contribute to the word count?
-
November 6, 2022 at 20:55 #16403
Word count is defined in the writing up instructions.
-
November 6, 2022 at 21:16 #16406
For the background section, are we trying to explain the festival background based within each of the headings in the marking scheme or include the content of the headings in a overall explanation?
-
November 7, 2022 at 08:30 #16418
As mentioned in many other posts, do not focus so heavily on Festival – its just a piece of software! The assignment is about the general principles of TTS.
Therefore, in the background section, you will want to explain those general principles: what does your reader need to know, in order to understand your explanations of the mistakes later in the report? That might include both how each step is done, but also whether that step is easy or hard, solved with current techniques or still an open problem, etc.
The formatting instructions specify which headings are compulsory and whether you can add subsections below them (yes, you can).
-
November 7, 2022 at 15:47 #16447
I had a followup question to answering ‘What in the speech signal differentiates different phones?’. Having looked at the modules and noted how the different phones are differentiated, what angle should we take in using this question to explain how concatenative TTS differs from human speech production and why this might lead to errors? I’m struggling to see a connection point between the two.
-
November 7, 2022 at 18:12 #16453
Is the question basically asking how different phones are differentiated in human speech and in speech synthesis?
-
-
November 7, 2022 at 18:27 #16457
There are lots of connections. Some hints:
do we use phonemes in TTS?
speech sounds are affected by the surrounding speech sounds through the process of co-articulation (which occurs both within and between words)
the source and filter each have different consequences for the acoustic properties of a speech sound: how is that knowledge used in TTS?
-
November 7, 2022 at 19:52 #16461
No, we use diphones instead as they capture co-articulation. The place and manner of articulation as well as whether the source is voiced or not will affect the acoustic properties of a speech sound.
Is this knowledge used in TTS through the concatenation of diphones that capture co-articulation?
-
November 7, 2022 at 20:58 #16464
You correctly state that diphones are used because they capture co-articulation.
But are you sure phonemes are not used anywhere in the system?
-
November 7, 2022 at 21:11 #16465
My mistake they are used. Phonemes are units that distinguish sounds and are used in g2p that makes use of LTS rules to determine pronunciation of words. Is this correct?
-
November 7, 2022 at 22:09 #16466
Correct! (Also in the pronunciation dictionary, of course.)
Actually, the symbol set is not exactly phonemes – it include allophones, for example. What is the difference?
-
November 8, 2022 at 11:11 #16469
Allophones are the representation of phonemes in actual speech sounds. A phoneme can make up different allophones. Is this correct?
-
-
-
November 10, 2022 at 13:17 #16486
For human speech production, do we talk about it from a phonetic and biological standpoint and for tts, do we describe the source filter model?
-
November 10, 2022 at 21:39 #16489
Only you can answer that: you need to give enough background for your reader to understand the points you will make later in the report. For example, if your explanation for an audible concatenation refers to source and filter properties, you should have provided enough background about that for your explanation to be understood by your reader.
-
-
-
AuthorPosts
- You must be logged in to reply to this topic.