You should now understand
- the difference between epoch detection and F0 estimation, and what each is used for
- the set of speech parameters that is needed for reconstruction of the waveform: F0, voicing decision, spectral envelope, and aperiodic energy
- how these parameters can be represented for statistical modelling: the Mel-cepstrum, and band aperiodicities
From the Speech Processing course, you already understood why we often need to carefully devise a good representation of the speech parameters – this is called “feature engineering”. You should now understand the typical types of feature engineering employed in statistical parametric speech synthesis.
In the next module, we’ll revise what we already know about Hidden Markov Models, but this time we will use speech features suitable for speech synthesis.