What is a vocoder?
The job of a vocoder is to convert a speech waveform into a representation that can be manipulated or modelled, and then back into a waveform again.
A brief detour into speech coding
Coding and vocoding are closely related, but have quite different applications.
From speech coding to speech synthesis
One way to develop the idea of parametric speech synthesis is via speech coding.
Vocoding using a source filter model
The most obvious way to construct a vocoder is as a source-filter model.
Speech coding using a source-filter model
Speech-specific codecs, like the one in your mobile phone, are generally based on a source-filter model.
STRAIGHT: introduction
STRAIGHT is the most widely-used vocoder in statistical parametric speech synthesis.
A little more about Fourier analysis
We need a little more intuition about Fourier analysis at this point.
STRAIGHT: smooth spectral envelope
What STRAIGHT does better than most other speech analysis methods, is to extract a smooth spectral envelope.
STRAIGHT: mixed excitation
We get better results when mixing both periodic and aperiodic sources, rather than switching between them.
A brief look at sinusoidal models
This class of models breaks the signal down into its periodic and non-periodic components, rather than source and filter.
Modelling the vocoder parameters
What properties should our vocoder parameters have, to make them easy to model?
Vocoding
In order to model speech, we need a parametric representation of it. This might be done using a source filter model, or something more general.