Speech Recognition: FFT Coefficients

This topic has 2 replies, 2 voices, and was last updated 8 years, 4 months ago by Simon King.

Viewing 2 reply threads

Author

Posts
- November 1, 2017 at 19:54 #8193
  blue fish
  Student
  In the speech recognition slide pack, I don’t understand slide 10 and 11.
  
  On slide 10, what do these coefficients multiplying the waves mean?
  
  Slide 11 says that: “We can vary the number of coefficients by varying the duration of the waveform being analysed”. This doesn’t make sense because should there be a fixed number of coefficients inside the vector for any part of the waveform? So don’t you mean you can vary the number of vectors?
- November 1, 2017 at 20:42 #8195
  Simon King
  Professor
  Slide 10: these co-efficients are the amount of energy at that frequency – these are the Fourier co-efficients (think of them as weights that multiply each sine wave). If we plot them, then we get the spectrum of the signal.
  
  Slide 11: the number of Fourier co-efficients depends on the duration of the signal being analysed. But remember that we don’t analyse the whole signal at once: we divide it into short frames and perform the analysis on each frame in turn. The frames all have the same, fixed duration (e.g., 25ms).
  
  The number of frames is equal to the total duration of the speech signal (e.g., an utterance) divided by the frame shift (e.g., 10ms).
- November 1, 2017 at 20:44 #8196
  Simon King
  Professor
  In these slides, I am temporarily imagining that the Fourier co-efficients (i.e., the magnitude spectrum) would be a good representation for Automatic Speech Recognition. Whilst we could use them, we can do better by performing some feature engineering – this is covered a little later on.
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.

Speech Recognition: FFT Coefficients

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis