Overlap-add

Cross-fading between two waveforms is an effective way to avoid some of the artefacts of concatenation.

This video just has a plain transcript, not time-aligned to the videoWe're synthesising speech by joining together recorded diphones: by concatenating their waveforms.
We've seen waveform concatenation must be done with care.
That's actually something that's true in general about signal processing.
Signal processing isn't just about theory; it's about the details and the care of the implementation.
We found that joining at zero crossings was much better than joining at an arbitrary point, but that a pitch-synchronous concatenation point can further reduce the chances of a join being audible.
But in general, it might not be possible to always find a place where the pitch marks are nicely spaced and there's a zero crossing.
So we need a more general-purpose method to make smooth sounding joins in all possible cases.
That actually turns out to be rather simple: by cross-fading.
That's called overlap-add.
Here's the solution.
As one song comes the end, this DJ has the next song ready on the other deck.
The DJ makes sure the two songs have a similar tempo and that the beats are aligned, then fades out the previous song whilst fading in the next song.
If the DJ does a good job of that, no one on the dance floor will notice and everyone keeps dancing happily.
We've already got our beats aligned by making pitch-synchronous joins.
Now we'll add the fade-out of the previous waveform while fading in the next one.
Here are two waveforms I'd like to concatenate.
I'm just going to simply fade out the first one.
In other words, I'm going to reduce the volume just at the end, down to zero, smoothly.
I'm going to fade in the second one: increase the volume from zero, smoothly.
Then overlap and add them.
Apply that fade out and fade in: that just scales down the amplitude to zero, smoothly.
I overlap them, and where the samples overlap, I'll sum them, I'll add them together.
Hence the name of the method: 'overlap-add'.
Here are the waveforms concatenated using overlap-add.
Now the join is quite hard to spot.
Overlap-add is a very general method; it's very useful.
But we're not quite as good as the DJ yet.
We've not taken care to match the fundamental period of the two waveforms before joining them.
So there may still be an audible discontinuity in the pitch.
Sudden changes in pitch are not natural and listeners will notice them.
You'll also remember that our front end has predicted values for F0 and for duration.
So we need a method to impose those onto the recorded speech because, in general, our recorded diphones won't have the desired F0 or duration that our front end predicted.
There's a single solution to all of those problems: a method for modifying both the fundamental frequency and the duration of speech directly on waveforms, called Time-Domain Pitch-Synchronous Overlap-and-Add.
Before understanding that, we'll need to revisit the concept of the pitch period, which will be the fundamental unit on which Time-Domain Pitch-Synchronous Overlap-and-Add will work, and remind ourselves how the pitch period relates the source-filter model.
Once we've done that, we can combine the idea of the pitch period with the technique of overlap-add and understand this powerful algorithm Time-Domain Pitch-Synchronous Overlap-and-Add, or TD-PSOLA for short.

Log in if you want to mark this as completed
This video covers topics:
Excellent 79
Very helpful 6
Quite helpful 3
Slightly helpful 1
Confusing 0
No rating 0
My brain hurts 0
Really quite difficult 0
Getting harder 5
Just right 79
Pretty simple 5
No rating 0