Finish

If you read very recent papers on speech recognition, you’ll find that MFCCs are not as widely used as in the past. Instead, filterbank features are quite common. This is because the type of machine learning being used has changed from HMMs (with Gaussian probability density functions) to Neural Networks.

Nevertheless, the concept of feature engineering is still important in many areas of Machine Learning. Even the most powerful statistical models will not perform very well using unsuitable features.

What you should know

  • What are generative models?
  • Gaussian distributions:
    • What shape is a Gaussian distribution?
    • How do we describe a multivariate Gaussian in terms of means and variances
    • How could this be used to tell us, for example, the probability that a specific feature vector is from a specific category?
  • Cepstral analysis, filterbanks, MFCCs
    • Non-linear speech perception: semitones, mel scale (from module 7)
    • What do we want MFCCs to capture about each frame of speech?
    • What are the steps used in generating MFCCs?
    • What step makes these ”mel” frequency cepstral coefficients
    • What step makes these features ”cepstral” coefficients?
    • Why do we generally prefer to use MFCCs to Mel filterbank features in HMM based ASR?
    • Why do we use MFCCs rather than the full magnitude or power spectrum in ASR?
    • Why do ASR models often we use 39 features per frame (e.g., the configuration in Assignment 2)? What are these features?
    • Why don’t we want our features to covary?

Key Terms

  • generative model
  • probability distribution
  • Gaussian distribution
  • mean
  • variance, covariance
  • standard deviation
  • multivariate
  • probability density function
  • feature vector
  • sampling
  • cepstral analysis
  • cepstrum
  • filterbank
  • mel-scale
  • mel-frequency cepstral coefficients
  • non-linear
  • log scale
  • spectral envelope
  • Discrete Cosine Transform
  • Inverse Discrete Fourier Transform
  • magnitude spectrum
  • power spectrum
  • delta features
  • acceleration (aka delta-delta) features