Taylor – Chapter 12

This topic has 4 replies, 3 voices, and was last updated 5 years, 6 months ago by Simon.

Viewing 4 reply threads

Author

Posts
- October 30, 2016 at 09:41 #5693
  Simon
  Professor
  Analysis of speech signals
- November 7, 2018 at 12:09 #9567
  Danielle O
  Student
  I did a Jurafsky and Martin reading (chapter 9.3) before this, and it used a diagram from this chapter (page 334, figure 9.14).
  
  However, in this reading the magnitude spectrum and log magnitude spectrums are reversed (page 354, Figure 12.11, compare 12.11a and 12.11b to 9.14a and 9.14b from previous readings).
  
  So which one is correct and which one is wrong? Which one is the normal spectrum and which one is its logged version?
  
  Thank you!
  
  Attachments:
  You must be logged in to view attached files.
- November 7, 2018 at 18:33 #9572
  Simon
  Professor
  The diagram in Taylor is correct.
  
  You can work this out yourself from first principles: taking the log will compress the vertical range of the spectrum, bringing the very low amplitude components up so we can see them, and bringing the high amplitudes (the harmonics, in this case) down.
  
  J&M messed up when they quoted it – a lesson in not quoting something unless you really understand it, perhaps!? Or maybe a printer’s error.
- November 9, 2019 at 14:47 #10175
  Sarah B
  Student
  On p.355 Taylor mentions r[n] referring to radiation. In section 11, he describes this as “radiation impedance from the lips”. Why is this separated out, rather than considered part of the vocal-tract filter?
- November 9, 2019 at 17:29 #10184
  Simon
  Professor
  Taylor is separating it out here because he is trying to show how the equations align with the physics of sound propagation in the vocal tract.
  
  Lip radiation can be assumed a constant effect: effectively, a filter that boosts high frequencies. This filtering effect is independent of the configuration of the articulators (Taylor, 2009, equation 11.29).
  
  Furthermore, the constant high-pass filtering effect of lip radiation is more than cancelled out by another constant effect of low-pass filtering at the sound source:
  
  It is this, combined with the radiation effect, that gives all speech spectra their characteristic spectral slope.
  
  (Taylor, 2009, page 332)
  
  So, we don’t need any learnable model parameters for these effects. We can account for them either by absorbing this constant effect into the vocal tract filter (which might be modelled using linear prediction) or by pre-emphasising the signal in the time domain (Taylor, 2009, page 375) to make its spectrum flatter, before any subsequent modelling, processing or feature extraction.
  
  Pre-emphasis is standard practice in most speech processing – can you find where this is done in the digit recogniser ?
Author

Posts

Viewing 4 reply threads

You must be logged in to reply to this topic.

Taylor – Chapter 12

Attachments:

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis