Clarification on ‘What in the speech signal differentiates different phones?’

This topic has 15 replies, 2 voices, and was last updated 3 years, 3 months ago by Simon King.

Viewing 10 reply threads

Author

Posts
- November 6, 2022 at 18:14 #16396
  Anon Student
  Student
  Hi there,
  
  I’m struggling to understand what is meant by ‘What in the speech signal differentiates different phones?’. Could i possibly get a hint or be pointed in the right direction for where I could find material that would help me answer this question?
  
  Thanks
- November 6, 2022 at 18:43 #16398
  Simon King
  Professor
  The answer is in the phonetics material of Speech Processing – go back over Module 1 to recap how speech is produced, then Module 2 which covers the acoustic properties of vowels and consonants. You might also find Module 4 helps you answer this question, especially the last video, “Phoneme”.
- November 6, 2022 at 19:22 #16399
  Anon Student
  Student
  Thank you, I also wanted to double check, do words in our diagrams contribute to the word count?
- November 6, 2022 at 20:55 #16403
  Simon King
  Professor
  Word count is defined in the writing up instructions.
- November 6, 2022 at 21:16 #16406
  Anon Student
  Student
  For the background section, are we trying to explain the festival background based within each of the headings in the marking scheme or include the content of the headings in a overall explanation?
- November 7, 2022 at 08:30 #16418
  Simon King
  Professor
  As mentioned in many other posts, do not focus so heavily on Festival – its just a piece of software! The assignment is about the general principles of TTS.
  
  Therefore, in the background section, you will want to explain those general principles: what does your reader need to know, in order to understand your explanations of the mistakes later in the report? That might include both how each step is done, but also whether that step is easy or hard, solved with current techniques or still an open problem, etc.
  
  The formatting instructions specify which headings are compulsory and whether you can add subsections below them (yes, you can).
- November 7, 2022 at 15:47 #16447
  Anon Student
  Student
  I had a followup question to answering ‘What in the speech signal differentiates different phones?’. Having looked at the modules and noted how the different phones are differentiated, what angle should we take in using this question to explain how concatenative TTS differs from human speech production and why this might lead to errors? I’m struggling to see a connection point between the two.
  - November 7, 2022 at 18:12 #16453
    Anon Student
    Student
    Is the question basically asking how different phones are differentiated in human speech and in speech synthesis?
- November 7, 2022 at 18:27 #16457
  Simon King
  Professor
  There are lots of connections. Some hints:
  
  do we use phonemes in TTS?
  
  speech sounds are affected by the surrounding speech sounds through the process of co-articulation (which occurs both within and between words)
  
  the source and filter each have different consequences for the acoustic properties of a speech sound: how is that knowledge used in TTS?
- November 7, 2022 at 19:52 #16461
  Anon Student
  Student
  No, we use diphones instead as they capture co-articulation. The place and manner of articulation as well as whether the source is voiced or not will affect the acoustic properties of a speech sound.
  
  Is this knowledge used in TTS through the concatenation of diphones that capture co-articulation?
- November 7, 2022 at 20:58 #16464
  Simon King
  Professor
  You correctly state that diphones are used because they capture co-articulation.
  
  But are you sure phonemes are not used anywhere in the system?
  - November 7, 2022 at 21:11 #16465
    Anon Student
    Student
    My mistake they are used. Phonemes are units that distinguish sounds and are used in g2p that makes use of LTS rules to determine pronunciation of words. Is this correct?
    - November 7, 2022 at 22:09 #16466
      Simon King
      Professor
      
      Correct! (Also in the pronunciation dictionary, of course.)
      
      Actually, the symbol set is not exactly phonemes – it include allophones, for example. What is the difference?
    - November 8, 2022 at 11:11 #16469
      Anon Student
      Student
      
      Allophones are the representation of phonemes in actual speech sounds. A phoneme can make up different allophones. Is this correct?
- November 10, 2022 at 13:17 #16486
  Anon Student
  Student
  For human speech production, do we talk about it from a phonetic and biological standpoint and for tts, do we describe the source filter model?
  - November 10, 2022 at 21:39 #16489
    Simon King
    Professor
    Only you can answer that: you need to give enough background for your reader to understand the points you will make later in the report. For example, if your explanation for an audible concatenation refers to source and filter properties, you should have provided enough background about that for your explanation to be understood by your reader.
Author

Posts

Viewing 10 reply threads

You must be logged in to reply to this topic.

Clarification on ‘What in the speech signal differentiates different phones?’

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis