› Forums › Automatic speech recognition › Features › Will MFCC Filter fail to capture the formant?
- This topic has 1 reply, 2 voices, and was last updated 7 years, 9 months ago by Simon.
-
AuthorPosts
-
-
November 10, 2016 at 01:32 #6013
As what I can understand now, the spectrum is treated like “waveform” to be processed by window and then FTT.
But in some extreme case that the amplitude of some piece of frequencies is pretty high(after all it does not fluctuate in a fixed axis like horizontal X to waveform), will this Filter fail to capture it since window cannot know how high the amplitude will be in advance? (even in log) -
November 10, 2016 at 12:24 #6015
Indeed, textbooks often suggest that you imagine the frequency axis to be time, then treat the FFT spectrum as a waveform. That’s fine, but we are smart people and know that the Fourier transform doesn’t only apply to time-domain signal: the horizontal axis can be labelled with anything we like.
You are worried that the cepstrum will fail to accurately capture high peaks in the spectrum. That’s a legitimate concerns. First, we can state that the cepstrum derived from the log magnitude spectrum will faithfully capture every detail, if we use enough cepstral co-efficients.
Your concern becomes relevant when we use (say) only the first 12 coefficients. When we do this (i.e., truncate the cepstrum), we are making an assumption about the shape of the spectral envelope. The fewer coefficients we use, the “smoother” we assume the envelope is.
The solution is empirical: try different numbers of cepstral coefficients and choose the number that works best (e.g., gives lowest WER in our speech recogniser).
For ASR, 12 coefficients is just right.
You could experiment with this number in the digit recogniser exercise. Just be careful to not store anything in the shared directory (everything there must use the original parameterisation) and to do everything in your own workspace. This will involve modifying the
make_mfccs
script and well as theCONFIG_for_coding
file. If you do this experiment, talk to the tutor first. Do it for a speaker-independent system with nice large training and testing sets.
-
-
AuthorPosts
- You must be logged in to reply to this topic.