Using our knowledge of speech production, we predict that making joins at mid-phone positions will sound better than at phone boundaries.
In this video, I occasionally refer to a second screen displaying a waveform; that screen was not recorded. I think most of the content of this video still makes sense.
Diphones
|
|