Although unit selection is essentially the concatenation of pre-recorded waveform fragments, we may store those waveforms in terms of source-filter model parameters.
The representation of speech that Festival uses at synthesis time is Residual-excited Linear Prediction Co-efficients (RELP). This allows for the possibility of manipulating the spectrum and F0 (for example, at concatenation points) as well as duration. However, in practice, Festival’s Multisyn engine does not actually do any of those things.
bash$ mkdir lpc bash$ make_lpc_from_wav wav/*.wav