- This topic has 1 reply, 2 voices, and was last updated 8 years, 1 month ago by .
Viewing 1 reply thread
Viewing 1 reply thread
- You must be logged in to reply to this topic.
› Forums › Automatic speech recognition › Features › Inverting MFCCs back to waveforms
When we get the MFCCs by applying a Fourier transform, we should be able to reconstruct a waveform by applying all of the inverse functions used to get the MFCCs, correct? If so, how much of the signal is lost in the conversion? I would assume that most of the loss would come from the windowing function, but would there be other factors to consider?
Ultimately, would the resulting (speech) audio be too distorted too understand? HTK doesn’t seem to have a means of converting MFCCs back to waveforms, so I can’t try it for myself. Is there any other existing way to reconstruct waveforms from MFCCs?
The cepstrum is invertible because each operation is invertible (e.g., the Fourier transform). However, MFCCs are not exactly the same as the cepstrum:
Normally, we discard phase and only retain the magnitude spectrum, from which we compute the (magnitude) cepstrum.
If we use a filterbank to warp the spectrum to the Mel scale (which is the normal method in ASR), this is not invertible: information is lost when we sum the energy within each filter.
For ASR, we truncate the cepstrum, retaining only the first (say) 12 coefficients. This is a loss of information.
Questions to think about:
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2024 · Balance Child Theme on Genesis Framework · WordPress · Log in