- This topic has 2 replies, 2 voices, and was last updated 6 years, 9 months ago by .
Viewing 2 reply threads
Viewing 2 reply threads
- You must be logged in to reply to this topic.
› Forums › General questions › Speech Recognition: FFT Coefficients
In the speech recognition slide pack, I don’t understand slide 10 and 11.
On slide 10, what do these coefficients multiplying the waves mean?
Slide 11 says that: “We can vary the number of coefficients by varying the duration of the waveform being analysed”. This doesn’t make sense because should there be a fixed number of coefficients inside the vector for any part of the waveform? So don’t you mean you can vary the number of vectors?
Slide 10: these co-efficients are the amount of energy at that frequency – these are the Fourier co-efficients (think of them as weights that multiply each sine wave). If we plot them, then we get the spectrum of the signal.
Slide 11: the number of Fourier co-efficients depends on the duration of the signal being analysed. But remember that we don’t analyse the whole signal at once: we divide it into short frames and perform the analysis on each frame in turn. The frames all have the same, fixed duration (e.g., 25ms).
The number of frames is equal to the total duration of the speech signal (e.g., an utterance) divided by the frame shift (e.g., 10ms).
In these slides, I am temporarily imagining that the Fourier co-efficients (i.e., the magnitude spectrum) would be a good representation for Automatic Speech Recognition. Whilst we could use them, we can do better by performing some feature engineering – this is covered a little later on.
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2024 · Balance Child Theme on Genesis Framework · WordPress · Log in