- This topic has 1 reply, 2 voices, and was last updated 1 year, 8 months ago by .
Viewing 1 reply thread
Viewing 1 reply thread
- You must be logged in to reply to this topic.
› Forums › Automatic speech recognition › Features › MFCC
Hi there,
In the lectures, it seems MFCC is applying series expansion directly on the spectrum, but in Wikipedia, we first map the energy to the mel-scale and then perform series expansion. Which one should we stick to in the report?
The screenshot from Wikipedia is in the appendix.
In the video Cepstral Analysis, Mel-Filterbanks, MFCCs we first had a recap of filterbank features. These would be great features, except that they exhibit co-variance.
We then reminded ourselves of how the source and filter combine in the time domain using convolution, or in the frequency domain using multiplication. We made them additive by taking the log and devised a way to deconvolve the source and filter. This video only explained the classical cepstrum – there was no Mel scale or filterbank.
Finally, in the video From MFCCs, towards a generative model using HMMs we developed MFCCs, by using our filterbank features as a starting point, then applying the same crucial steps as we would for using the cepstrum to obtain the filter without the source: take the log (make source and filter additive), series expansion (separate source and filter along the quefrency axis), truncate (discard the source).
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2024 · Balance Child Theme on Genesis Framework · WordPress · Log in