Introduces basic concepts in human hearing – it may be useful to read the bits on decibels/loudness and the Mel and Bark scales.
Taylor – Section 12.3 – The cepstrum
By using the logarithm to convert a multiplication into a sum, the cepstrum separates the source and filter components of speech.
Holmes & Holmes – Chapter 10 – Front-end analysis for ASR
Covers filterbank, MFCC features. The material on linear prediction is out of scope.
Jurafsky & Martin – Section 9.3 – Feature Extraction: MFCCs
Mel-frequency Cepstral Co-efficients are a widely-used feature with HMM acoustic models. They are a classic example of feature engineering: manipulating the extracted features to suit the properties and limitations of the statistical model. Please note: the description of MFCC extraction steps differs somewhat from the standard definition of MFCCs and what is actually implemented in HTK. For the assignment, you should follow the description of MFCC extraction steps from the videos here on speech zone and in the lectures.


This is the new version. Still under construction.