- This topic has 1 reply, 2 voices, and was last updated 6 years, 9 months ago by .
Viewing 1 reply thread
Viewing 1 reply thread
- You must be logged in to reply to this topic.
› Forums › Speech Synthesis › Unit selection › Usage of TD-PSOLA
I was wondering how commonly TD-PSOLA is used in speech processing and if there were better alternatives to it. I ask because TD-PSOLA seems too crude to have any practical use. Wouldn’t it make more sense to, for instance, identify the spectral envelope of a speech signal and feed it a different F0 to produce a new spectrum?
Yes, TD-PSOLA is still used for speech modification – for relatively small changes in duration or F0, it gives very high quality (if implemented carefully).
You’re right that TD-PSOLA looks “crude”. I’d prefer to say that it is deceptively simple and really quite elegant, once you deeply understand what it is doing. It’s actually an implicit source-filter separation, but can only modify the source (i.e., duration and F0). It cannot modify the filter.
Do you understand where the filter is in TD-PSOLA? Why is it not possible to modify it?
Your alternative suggestion is spot on: to obtain the spectral envelope and to “feed it” (we usually say “excite it”) with a new source signal of the desired F0. This is precisely what an explicit source-filter model can do. Linear prediction is a common choice for the filter.
You suggest feeding the filter with a “different F0”. Given that the source-filter model operates in the time domain, what exactly would a “different F0” mean? Can you draw a diagram?
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2024 · Balance Child Theme on Genesis Framework · WordPress · Log in