- This topic has 1 reply, 2 voices, and was last updated 9 years, 1 month ago by .
Viewing 1 reply thread
Viewing 1 reply thread
- You must be logged in to reply to this topic.
› Forums › Automatic speech recognition › Features › Cepstrum
I understand that the reasoning for taking the log of the magnitude of the spectral values is so that the product of filter and source can be reinterpreted as an addition. However, how is the inverse FT unpacking even more information?
And I may be asking too much now, but how is this whole operation equivalent to the discrete cosine transform?
The Inverse Fourier Transform (or alternatively a Discrete Cosine Transform – either can be used) doesn’t create any more information, but you are right to say that it does “unpack” the information and spreads it out along the quefrency axis. That’s very much like the process by which the Fourier transforms “unpacks” the frequencies in a a waveform and lays them out along the frequency axis to make the spectrum. We can then perform some operations more conveniently in the frequency domain (e.g., finding the formants, or performing filtering).
After the “unpacking” in cepstral analysis, it’s possible to ignore information that is not wanted: when extracting MFCCs this means ignoring all the higher quefrency components and only retaining the first few coefficients (typically, 12). That is done simply by truncating the cepstrum (effectively setting everything above the 12th coefficient to zero).
Is it any clearer now? If not – keep asking!
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2024 · Balance Child Theme on Genesis Framework · WordPress · Log in