› Forums › Automatic speech recognition › Features › Source in the cepstrum – eliminated after Mel?
- This topic has 1 reply, 2 voices, and was last updated 6 years, 8 months ago by Simon.
-
AuthorPosts
-
-
November 29, 2017 at 00:13 #8600
The book talks about the Mel scale and the cepstrum seperately, the examples explain one and the other, but not both together, which leads to some confusions. I got that the log & DCT are applied to the Mel spectrum, not the Hz spectrum (as in the figure J&M provide). But J&M mention that Mel filters are deliberately spaced in a way that will loose track of the fundamental frequency. How come we still see the “source” in the cepstrum?
-
November 29, 2017 at 08:46 #8601
Jurafsky & Martin (J&M) include a figure showing the “classical” cepstrum, and this is what is confusing you. As you say, they fail to make a clear connection between this and MFCCs.
To clear this up, we need to distinguish between the classical cepstrum, and what actually happens in creating MFCCs.
Let’s start with the classical cepstrum, as in J&M’s Figure 9.14 (borrowed from Taylor, who gives a better explanation – read that if you can).
WARNING: in J&M’s Figure 9.14, the plots for (a) and (b) need to be swapped in order for the caption to be correct! My explanation below assumes you’ve corrected this figure.
The three subfigures illustrate the key stages of
(a) obtaining the spectrum from the waveform, using an FFT – in this domain, the source and filter are multiplied together
(b) taking the log of the spectrum, which makes the source and filter additive
(c) performing a series expansion (e.g., DCT) which “lays out” the different components of the log spectrum along an axis, such that the source components and filter components are in different places along that axis and can easily be separated. In J&M’s Figure 9.14(c) we can see the fundamental period as a small peak around the middle of the cepstrum.
There is no filterbank in the classical cepstrum, and no Mel-scaling of the frequency axis.
Mel Frequency Cepstral Co-efficients are inspired by the classical cepstrum and use the same key processing steps. In addition, MFCC extraction involves some additional processing: a Mel-scaled filterbank. This happens after 9.14(a) and so 9.14(b) becomes a smooth spectral envelope (no harmonics) on a Mel scale, and 9.14(c) would no longer have the small peak corresponding to the fundamental period.
The filterbank serves two purposes. First, it’s an easy way to warp the frequency scale from linear (Hertz) to a Mel scale, simply by placing the filter’s centre frequencies evenly apart on a Mel scale. Second, it’s an opportunity to smooth the spectrum and reduce the prominence of the harmonics – in other words, to produce a spectrum that contains less information about the source.
To summarise: J&M’s Figure 9.14(c) is the classical cepstrum and is not one of the stages on the way to MFCCs.
-
-
AuthorPosts
- You must be logged in to reply to this topic.