Will MFCC Filter fail to capture the formant?

This topic has 1 reply, 2 voices, and was last updated 8 years, 6 months ago by Simon.

Viewing 1 reply thread

Author

Posts
- November 10, 2016 at 01:32 #6013
  Xiao Z
  Student
  As what I can understand now, the spectrum is treated like “waveform” to be processed by window and then FTT.
  But in some extreme case that the amplitude of some piece of frequencies is pretty high(after all it does not fluctuate in a fixed axis like horizontal X to waveform), will this Filter fail to capture it since window cannot know how high the amplitude will be in advance? (even in log)
- November 10, 2016 at 12:24 #6015
  Simon
  Professor
  Indeed, textbooks often suggest that you imagine the frequency axis to be time, then treat the FFT spectrum as a waveform. That’s fine, but we are smart people and know that the Fourier transform doesn’t only apply to time-domain signal: the horizontal axis can be labelled with anything we like.
  
  You are worried that the cepstrum will fail to accurately capture high peaks in the spectrum. That’s a legitimate concerns. First, we can state that the cepstrum derived from the log magnitude spectrum will faithfully capture every detail, if we use enough cepstral co-efficients.
  
  Your concern becomes relevant when we use (say) only the first 12 coefficients. When we do this (i.e., truncate the cepstrum), we are making an assumption about the shape of the spectral envelope. The fewer coefficients we use, the “smoother” we assume the envelope is.
  
  The solution is empirical: try different numbers of cepstral coefficients and choose the number that works best (e.g., gives lowest WER in our speech recogniser).
  
  For ASR, 12 coefficients is just right.
  
  You could experiment with this number in the digit recogniser exercise. Just be careful to not store anything in the shared directory (everything there must use the original parameterisation) and to do everything in your own workspace. This will involve modifying the make_mfccs script and well as the CONFIG_for_coding file. If you do this experiment, talk to the tutor first. Do it for a speaker-independent system with nice large training and testing sets.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

Will MFCC Filter fail to capture the formant?

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis