Videos

Total video to watch in this module: 63 minutes

Using our knowledge of speech production, we predict that making joins at mid-phone positions will sound better than at phone boundaries.

In this video, I occasionally refer to a second screen displaying a waveform; that screen was not recorded. I think most of the content of this video still makes sense.

Log in if you want to mark this as completed

Time-domain pitch-synchronous overlap-and-add is a remarkably simple but effective way to independently modify the duration and F0 of speech.
Log in if you want to mark this as completed
To understand the power of linear predictive waveform coding, we'll consider the problem of smoothing the joins in concatenative synthesis.
Log in if you want to mark this as completed
It's just a simple equation, but for this course we don't need to get too deep into the maths.

The notation used in the linear prediction equation is slightly different from that used in class. It’s only a change of notation: the equation is otherwise exactly the same.

Log in if you want to mark this as completed

Now we understand how linear predictive coding works, we can use it to smooth the spectral envelope across joins.
Log in if you want to mark this as completed
Since the filter is time-varying, we need to decide how frequently (and at what moments in time) to update its co-efficients.
Log in if you want to mark this as completed
Exciting the filter with a simple pulse train doesn't produce good quality. Fortunately, there is an almost-perfect excitation signal: the residual.
Log in if you want to mark this as completed