Unit selection: how waveform generation is achieved through selection and concatenation of waveform segments, the data required to do this, and the limitations of this approach.
The method
It seems simple: choose a suitable sequence of pre-recorded speech segments, and play them back in the right order. But how do we make that choice, and can we really just play back a sequence of waveform fragments taken from different utterances?
The database
The quality of a unit selection system depends very much on the speech database, both the quality of the recorded speech and the accuracy of the labels.