Inverting MFCCs back to waveforms

This topic has 1 reply, 2 voices, and was last updated 8 years, 3 months ago by Simon.

Viewing 1 reply thread

Author

Posts
- November 15, 2016 at 22:44 #6031
  Jonas R
  Student
  When we get the MFCCs by applying a Fourier transform, we should be able to reconstruct a waveform by applying all of the inverse functions used to get the MFCCs, correct? If so, how much of the signal is lost in the conversion? I would assume that most of the loss would come from the windowing function, but would there be other factors to consider?
  
  Ultimately, would the resulting (speech) audio be too distorted too understand? HTK doesn’t seem to have a means of converting MFCCs back to waveforms, so I can’t try it for myself. Is there any other existing way to reconstruct waveforms from MFCCs?
- November 16, 2016 at 09:09 #6032
  Simon
  Professor
  The cepstrum is invertible because each operation is invertible (e.g., the Fourier transform). However, MFCCs are not exactly the same as the cepstrum:
  
  Normally, we discard phase and only retain the magnitude spectrum, from which we compute the (magnitude) cepstrum.
  
  If we use a filterbank to warp the spectrum to the Mel scale (which is the normal method in ASR), this is not invertible: information is lost when we sum the energy within each filter.
  
  For ASR, we truncate the cepstrum, retaining only the first (say) 12 coefficients. This is a loss of information.
  
  Questions to think about:
  - why do we discard phase?
  - why do we use a filterbank?
  - why do we truncate the cesptrum?
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.